Psychometrika 


VOLUME XxX—1955 
JANUARY-DECEMBER 





Editorial Council 


Chairman:—HaroLpD GULLIKSEN Managing Editor:— 
Dorotuy C. ADKINS 


Editors:—M. W. RicHarpson Assistant Managing Editor:— 
Paut Horst B. J. WINER 


Editorial Board 


R. L. ANDERSON Bert F,. GREEN GrorGE A. MILLER 

T. W. ANDERSON J. P. GuILForD Wo. G. MoLLENKOPF 
J. B. CARROLL HaAroLp GULLIKSEN FREDERICK MOSTELLER 
H. 8. Conrap Paut Horst GEORGE E. NICHOLSON 
L. J. CRONBACH AtsTon 8. HousEHOLDER M. W. RicHARDSON 


E. E. CurEToN LYLE V. JONES Wa. STEPHENSON 

ALLEN EpWARDS TruMAN L. KELLEY R. L. THORNDIKE 

Max D. ENGeEtHART ALBERT K. Kurtz LEDYARD TUCKER 

Wo. K. Estes FrepDErRIc M. Lorp D. F. Voraw, Jr. 

Henry E. GARRETT Irvine LorGEe S. S. Winks 
Quinn McNEMAR 





GODFREY THOMSON L. L. THURSTONE 














PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 


AT 1407 SHERWOOD AVENUE 
RICHMOND 5, VIRGINIA 








UNIVERSITY 
OF MICHIGAN 


MAR 46 1955 
SOGIAt SEtENCE 


sychometrika 


A JOURNAL DEVOTED TO THE DEVEL- 
OPMENT OF PSYCHOLOGY AS A 
QUANTITATIVE RATIONAL SCIENCE 


















































THE PSYCHOMETRIC SOCIETY = ORGANIZED IN 1935 








VOLUME 20 
NUMBER 1 


MARCH 
Bes 3 











PsycHometTnri«ca, the official journal of the Psychometric Society, is devoted to the develop- 
ment of psychology as a quantitative rational science. Issued four times a year, on March 15, 
June 15, September 15, and December 15. 


Marca, 1955, Votume 20, Numszr 1 


Published by the Psychometric Society at 1407 Sherwood Avenue, Richmond 5, Virginia, 
Entered as second class matter at the Post Office of Richmond, Virginia. Editorial Office, 
Department of Psychology, The University of North Carolina, Chapel Hill, North Carolina. 


Subscription Price: The regular subscription rate is $14.00 per volume. The subscriber 
receives each issue as it comes out, and, upon request, a second complete set for binding at 
the end of the year. Al! annual subscriptions start with the March issue and cover the calen- 
dar year. All back issues but two are available. Back issues are $14.00 per volume (one set 
only) or $3.50 per issue, with a 20 per cent discount to Psychometric Society Members. 
Members of the Psychometric Society pay annual dues of $7.00, of which $6.30 is in payment 
of a subscription to Psychometrika. Student members of the Psychometric Society pay 
annual dues of $4.00, of which $3.60 is in payment for the journal. 


Application for membership and student membership in the Psychometric Society, together 
with a check for dues for the calendar year in which application is made, should be sent to 


Me.vin D. Daviporr 
Test Development Section, U.S. Civil Se ice Commission, Washington 25, D. C. 


Payments: All bills and orders are payable in advance. 
Checks covering membership dues should be made payable to the Psychometric Society. 


Checks covering regular subscriptions to Psychometrika (for non-members of the Psycho- 
metric Society) and back issue orders should be made payable to the Psychometric Corpora- 
tion. All checks, notices of change of address, and business communications should be sent to 


Davi R. Sacnpers, Treasurer, Psychometric Society and Psychometric Corporation 
Educational Testing Service 

P.O. Box 592 

Princeton, New Jersey 


Copies not received because of failure to report change of address will be replaced, upon 
request, and will be billed at the appropriate back issue rate. Allow at least six weeks for 
notification of change of address to take effect, sending notice to the Treasurer of the 
Psychometric Corporation. 
Articles on the following subjects are published in Psychometrika: 
(1) the development of quantitative rationale for the solution of psychological 
problems; 
(2) general theoretical articles on quantitative methodology in the social and bio- 
logical sciences; 
(3) new mathematical and statistical techniques for the evaluation of psychological 
data; 
(4) aids in the application of statistical techniques, such as nomographs, tables, 
work-sheet layouts, forms, and apparatus; 
(5) critiques or reviews of significant studies involving the use of quantitative tech- 
niques. 


The emphasis is to be placed on articles of type (1), in so far as articles of this type are 
available. 
(Continued on the back inside cover page) 











en 











a 





Psychometrika 








CONTENTS 


SAMPLING FLUCTUATIONS RESULTING FROM THE 
SAMPLING OF TEST ITEMS 


FREDERIC M. LorpD 


SEPARATION OF DATA AS A PRINCIPLE IN FACTOR 
ANALYSIS 
CHESTER W. HARRIS 


THE CHOICE OF AN ERROR TERM IN ANALYSIS OF VARI- 
ANCE DESIGNS 
ARNOLD BINDER 


A RATIONAL CURVE RELATING LENGTH OF REST PERIOD 
AND LENGTH OF SUBSEQUENT WORK PERIOD FOR 
AN ERGOGRAPHIC EXPERIMENT 
LEDYARD R TuckER 


A MEASURE OF INTERRELATIONSHIP FOR OVERLAPPING 


GROUPS 
B. J. WINER 


AN EXTENSION OF ANDERSON’S SOLUTION FOR THE LA- 
TENT STRUCTURE EQUATIONS 69 
W. A. GriBson 


A FACTOR ANALYSIS OF MENTAL ABILITIES AND PERSON- 
ALITY TRAITS 
J. C. DENTON AND C. W. Taytor 


A TABULAR METHOD OF OBTAINING TETRACHORIC r WITH 
MEDIAN-CUT VARIABLES 
GrorGE S. WELSH 


AN IBM METHOD FOR COMPUTING INTRASERIAL COR- 
RELATIONS 
M. C. Payne, Jr. AND L. STaAuGAS 








VOLUME TWENTY MARCH 1955 NUMBER 1 








COOPERATIVE GRADUATE SUMMER SESSIONS IN STATISTICS 


The University of Florida, North Carolina State College, Virginia Polytechnic Insti- 
tute, and the Southern Regional Education Board are jointly sponsoring a series of co- 
operative summer sessions in statistics. The first session was held in 1954 at Virginia Poly- 
technic Institute. The second session will be held at the University of Florida from June 
20 to July 29, 1955. A session is scheduled to be held at North Carolina State College in 1956, 
and another at Virginia Polytechnic Institute in 1957. 

The sessions, based upon a recommendation of the Southern Regional Education 
Board’s Advisory Commission on Statistics, will be of interest to (1) research and pro- 
fessional workers desiring instruction in basic statistical concepts, (2) teachers desiring 
training in modern statistics, (3) prospective candidates for graduate degrees in statistics, 
(4) graduate students in other fields wanting training in statistics, and (5) statisticians 
who wish to keep informed about advanced specialized theory and methods. 

Each session lasts six weeks and each course carries approximately three semester 
hours of graduate credit. The program may be entered at any session; consecutive courses 
will follow in successive summers. The summer work may be applied as residence credit 
at any one of the cooperating institutions, as well as certain other institutions, in partial 
fulfillment of the requirements for a master’s degree. The catalog requirements for the 
degree must be met at the degree-granting institutions. Doctoral candidates should consult 
with their institutions regarding the applicability of the courses. The faculty for the 1955 
session will include: 


R. L. Anderson, N. C. State College G. E. Nicholson, Jr., Univ. of N. C. 
D. B. Duncan, Univ. of Fla. P. J. Rulon, Harvard Univ. 

B. Harshbarger, Va. Poly. Inst. W. L. Smith, Univ. of N. C. 

C. E. Marshall, Okla. A. and M. D. B. South, Univ. of Fla. 


H. A. Meyer, Univ. of Fla. 


Courses to be offered are: 


Statistical Methods I Theory of Sampling 

Statistical Methods II Theory of Statistical Inference 
Design of Experiments Statistical Research in 

Statistical Theory I Psychology and Education 

Statistical Theory IT Mathematics for Statistics 
Inference and Least Squares Seminar in Recent Advances 

Advanced Analysis I in Statistics 


The tuition fee is $35 for the six-weeks term. The holder of a doctorate degree, upon 
acceptance, may register without the payment of tuition. Living and other expenses at 
the University are reasonable. Inquiries should be addressed to Professor Herbert A. Meyer, 
Statistical Laboratory, University of Florida, Gainesville, Florida. 














PSYCHOMETRIKA—VOL. 2, No. 1 
MARCH, 1955 


SAMPLING FLUCTUATIONS RESULTING FROM THE 
SAMPLING OF TEST ITEMS* 


Freperic M. Lorp 


EDUCATIONAL TESTING SERVICE 


Sampling fluctuations resulting from the sampling of test items rather 
than of examinees are discussed. It is shown that the Kuder-Richardson 
reliability coefficients actually are measures of this type of sampling fluctua- 
tion. Formulas for certain standard errors are derived; in particular, a simple 
formula is given for the standard error of measurement of an individual 
examinee’s score. A common misapplication of the Wilks-Votaw criterion 
for parallel tests is pointed out. It is shown that the Kuder-Richardson 
formula-21 reliability coefficient should be used instead of the formula-20 
coefficient in certain common practical situations. 


1. Introduction 


Suppose that the same test is administered to a large number of separate 
groups of examinees, the groups being random samples all drawn from the 
same population; and suppose that some test statistic is computed separately 
for each sample of examinees. The value obtained for this test statistic will, 
of course, differ from sample to sample because of sampling fluctuations. 
The standard deviation of these values over a very large number of samples 
is the standard error of the test statistic when examinees are sampled. For 
convenience, this type of sampling will be referred to as type-1 sampling. 

On the other hand, suppose that a large number of forms of the same test 
are administered to the same group of examinees, each form consisting of a 
random sample of items drawn from a common population of items; and 
suppose that some test statistic is computed separately for each form of the 
test. Let us assume for theoretical purposes that the examinees do not change 
in any way during the course of testing, i.e., that there is no practice effect, 
no fatigue, etc. The value computed for the test statistic will still, of course, 
differ from form to form because of sampling fluctuations. The standard 
deviation of these values over a very large number of samples is the standard 
error of the test statistic when the test items are sampled. This type of sampling 
will be referred to as a type-2 sampling. Test forms constructed by type-2 
sampling will be called randomly parallel forms or randomly parallel tests. 

Type-1 sampling fluctuations are familiar to everyone; type-1 standard 

*Most of the work reported here was carried out under contract with the Office of 


Naval Research. The writer is indebted to Professor S. S. Wilks, who has checked over 
certain critical portions of a draft of this paper. 


AY 








2 PSYCHOMETRIKA 


error formulas have long been available; they are sometimes incorrectly used 
in situations where sampling of test items is of crucial importance. Formulas 
for type-1 and type-2 standard errors may usually be readily distinguished 
on a superficial level by the following characteristics, which underscore the 
essential difference between them: type-1 standard errors are usually obviously 
proportional to some power (positive or negative) of the number of examinees 
in the sample and are usually much less obviously and simply related, if at 
all, to the number of items in the test; type-2 standard errors have the corre- 
sponding characteristic with respect to the number of items in the sample. 

Section 2 of the present paper summarizes notation and lists type-2 
standard error formulas without proof. Section 3 discusses two practical 
illustrative situations in which sampling of items is of crucial importance. 
Section 4 investigates the relation between the standard errors of individual 
examinees’ scores and the Kuder-Richardson reliability coefficients, and 
reaches some important conclusions regarding the formula-21 coefficient. 
Section 5 discusses certain familiar formulas, including the Wilks-Votaw 
criterion for parallel tests, in relation to type-2 sampling formulas. Section 
6 shows. that the type-2 sampling distribution of most test statistics will be 
approximately normal when the number of test items is sufficiently large. 
Section 7 gives the derivation of the type-2 standard errors presented in 
section 2. Section 8, finally, discusses simultaneous sampling of items and 
examinees (type-12 sampling) and derives certain standard error formulas 
appropriate for this more complicated situation. 


2. Notation and Summary of Formulas 
In the present study, standard errors are obtained for the following 
test: statistics: 


tg = the observed test score of examinee a, obtained by counting the number of items 
answered correctly on a single test. 
~ = the mean of the scores obtained by the N examinees on a single test. 


i= > 1,/N. 
8; = the standard deviation of the scores obtained by the N examinees on a single test. 
ss= > &/N -P. 
7.1 = the Kuder-Richardson reliability coefficient, formula 21. 
n 
in) = ie eee [1 — im — t)/nsi]. 
r Or fo = the Kuder-Richardson reliability coefficient, formula 20. 
n j 27,2 
ya - Lae 
(symbols explained in the succeeding list). 
re: = the correlation of the test score with any external variable, c. rez = 8¢4/8c8:. 




















FREDERIC M. LORD 3 


Considerable care in defining notation must be taken here in order to 
avoid serious confusion. Additional symbols that will be used are listed below 
for easy reference. 

Zia = the “score” of examinee a on item 7. 
‘i = 1 if item answered correctly 
= 0 otherwise. 
the number of items in a single form of a test, i.e., in a single sample. The subscript 


nr = 
7 runs from 1 to n. 
N = the number of examinees in a single group of examinees. The subscript a runs from 
1 to N. 
m = the number of items in a finite population of items. 
pi = the observed “difficulty” of item 7 for the N examinees tested. 
p= 2, %./N. 
@q=l-p;. 
Za = the “proportion-correct score” of examinee a; the proportion of the items in a single 


test answered correctly by examinee a. 2 = ta/n. 
Z, ¢, etc. = the mean of the N values of z, ¢, etc. 


z= Zz z./N, ete. 
M(p) = the mean of the n observed vei of p; for the n items in the test administered. 
M@) = Dp./n. 
S<, Sz, etc. = the standard deviation of the N values of c, z, etc. 
8 = )2/N —?, ete. 
s; = the standard deviation of xia for fixed 7. 
8; = Zz ti./N re (> ti./N)? = Piqi - 
Sct, etc. = the covariance baie examinees) of c and ¢, etc. 


Set ~ SSP ct = > (Ca oie é)(t, aa t)/N. 


Sic, Siz, Sit = the covariance (over examinees) of Ca, Za, OF ta , respectively, with xia , for 


fixed 7. 
Sie = 880 = >, (tia — pr(ta — B/N. 


s(p) = the standard deviation of the n observed values of p; for the n items in the test 


administered. 
s*(p) = 2) pi/n — M*(p). 


8(Siz), (siz), etc. = the standard deviation of the n observed values of s;, , si: , etc. for 
the n items in the test administered. 


s'(8i.) = Do sie/n — (Qo 8:e/n)”. 
+t + 
8(Sic , 8:4) = the covariance (over items) of s;, and s;;. 


8(8:. 8.) = 2d 8:<8::/n — (Xo 8.</n)( Do 8;,/n). 


Tie, Tit, Tiz = the correlation of cg , ta , Or Za , respectively, with x. , for fixed 7. rie = 8;2/8;5:. 














*IO}LIM 0} UMOUY JON, 





N/A 1$°8 

















a war gig tS 88 ae Ce 5 aK ’ 
* eer 
(ts § td)sQaz — u)(4 — T)up — (’'8),8,(%4 — 1) ub + ©8002 — UA * 18 
- (**s) 8,084 ina DP + Ce? ts)s(°%u oat DP a (38) 8A a FL 
Ne > 's 
$ 


y *s)s UA 


x’ @sur 2 


- Eri ’ 


(ssoururexe Suijdureg) (suie}1 4804 Sut;dureg) 
| odAy, z adky, OT4STFBIS 





< 
be 
E 
a 
° 
im 
3) 
al 
m 
a 




















SOILISLVLG LSA], 40 SUOUUG GUVANVIg 
T @TaViL 














FREDERIC M. LORD 5 


It should be noted that all the statistics in the foregoing list are observed 
sample statistics relating to a given sample. There are two kinds of statistics 
listed, typified, in the simplest case, by 2 = )., z,/N and M(p) = >|; p,/n. 
Population parameters have not been listed but will be designated, when 
needed, by the use of Greek letters. The following additional symbols, relating 
to the totality of all possible samples of test items (type-2 sampling), will 
be used. 


E(2) = the expected value of x; the arithmetic mean of the statistic x over all possible 
samples. 
S.E.(z) = the standard error of the statistic z; the standard deviation of the statistic x 


over all possible samples. 8.E.2(z) = E(a*) — [E(z)}*. 
var(x) = the sampling variance. var(z) = S.E.*(z). 


cov(x, y) = the sampling covariance of the statistics x and y over all possible samples. 
cov(z, y) = E(xy) — E(x) E(y). 


Table 1 summarizes the more important of the type-2 standard errors 
derived in the present paper. For purposes of comparison, the last column of 
the table, when appropriate, gives the corresponding usual type-1 formulas 
for the standard error for the case where the test scores are assumed to be 
normally distributed. The standard error formulas in both columns are large- 
sample formulas, in general, and observable sample statistics have been 
substituted for the corresponding population values throughout. 

Type-12 standard errors are not listed here; their treatment is left for 
a special section. 


3. Illustrative Examples and Discussion of the Standard Errors 


Suppose that Form A of a certain 135-item test has been administered. 
Several parallel forms of this same test are to be administered in the future. 
Each form is administered to a different group of examinees. The groups of 
examinees may be considered as random samples drawn from the same 
population. Each group is so large that differences between groups due to 
sampling of examinees may be ignored. It is found that the mean, standard 
deviation, and Kuder-Richardson formula-20 reliability of the scores on 
Form A are 63.5, 21.5, and 0.95, respectively. How much may we expect 
the means to vary from form to form? 

The required value of s(p) could, of course, be determined directly from 
item analysis data. However, this value can be calculated, by means of (1), 
from the three numerical values given in the preceding paragraph. (1) is 
readily obtained by solving for s’(p) in Tucker’s modification (9) of the usual 
formula for the Kuder-Richardson formula-20 reliability coefficient. 

2 2 
s‘(p) -#(2=1,,- 142-4. (1) 
We find that s’(p) = .0538. 











6 PSYCHOMETRIKA 


The large-sample estimate of the type-2 standard error of the mean is 
found to be S.E.,(2) = 2.7. (The subscript “2” is used here, and the sub- 
script “1” is used below, to indicate type-2 and type-1 standard errors, 
respectively. Hereafter, type-2 sampling will be understood, unless otherwise 
specifically indicated.) If the same test were administered to random groups 
of 135 examinees, the type-1 standard error would be 8.E.,(@) = 1.8. 

On the basis of the foregoing, we may expect that parallel forms of the 
test would not differ from each other in mean score by as much as2 V28.E.,(E) 
= 7.6 points more than one time in twenty. If the parallel forms are carefully 
constructed by matching items from form to form on difficulty and item-test 
correlation rather than by random sampling of items, it may well be that 
the forms will not differ from each other as much as the foregoing formulas 
would indicate. 

Suppose, for example, it is desired to investigate the relation of length of 
reading passage to validity in a reading comprehension test. The experimenter 
might well select at random from a pool of all available reading items of some 
specified difficulty level (a) a sample of all items based on passages containing 
more than 200 words and (b) a sample based on passages containing less than 
100 words (it is assumed here that there is only one item per reading passage). 
He then places these items in random order and administers them to a group 
of examinees, obtaining separate scores for the long and for the short items. 
He computes the validity of each score, using some available criterion. If 
the two validity coefficients differ by little more than the type-2 standard 
error of their difference, it seems likely that the difference is attributable to 
chance fluctuations due to the sampling of items. If they differ by several 
times this standard error, the opposite conclusion may be reached; insofar 
as other uncontrolled experimental variables are ruled out, the difference may 
plausibly be attributed to length of reading passage. 

A note of caution is necessary in using the type-2 standard error formulas. 
These formulas involve no assumptions beyond random sampling and large 
n; however, it is not at present known just how large an n is needed in any given 
case. The formulas in Table 1, therefore, should be used with some caution. This 
is particularly true of the last three rows of the table, since the correlation 
coefficients given in the first column undoubtedly have sharply skewed 
distributions when n is small. 

It should, finaliy, be noted that the assumption of random sampling 
of items cannot be expected to hold for speeded tests, and the formulas given 
in the present paper must be considered inapplicable. 


4. Standard Errors of Measurement and Test Reliability 


Table 1 gives a practical approximation to S.E.(¢,) in terms of observed 
sample statistics; the rigorously accurate value, as shown in a later section, is 











FREDERIC M. LORD 7 


S.E.(t,) = ln — 1,). (2) 


Here r, = E(t,) is the true score of examinee a, i.e., the expected value of ¢, 
over all randomly parallel forms of the test. [The expectation symbol, £, 
denotes the mean value over all type-2 samples; thus the operator E can be 
treated by the same rules as a summation sign.] The standard error of the 
score of an examinee is the standard deviation of the errors of measurement 
of his score (error of measurement = ¢, — 7,). The average, taken over all 
examinees, of the squared values of such standard deviations of errors of 
measurement, 


1 tes a - Om — «1? 
w USE) = 7 LM — 1)", (3) 


may appropriately be compared with the conventional ‘standard error of 
measurement”’ of test theory. This latter, which will be denoted by ‘“S.E. 
Meias.,” is likewise an average over all examinees. It is conventionally defined 
by the formula 





S.E.Meas. = s, V1 — reliability. (4) 


Specifically, it will now be shown that the squared standard error of 
measurement given by (3) is exactly equal to that which would be expected 
in (4) if the test reliability there were given by the Kuder-Richardson formula- 
21 coefficient (6). In our notation, this coefficient is 


ae s; — 1 — 2/n) 
a n— 1 3 





(5) 


The significance of the present proof is that it shows that the Kuder-Richardson 
formula-21 coefficient (and, as will be seen, the formula-20 coefficient also) is no 
more nor less than a measure of the magnitude of type-2 sampling errors (relative, 
of course, to the magnitude of true score differences). 

Averaging (2) over all examinees, we find 


1 
N De S.E.7(¢,) ~ Ta(N — Ta) 

a - % 2 

“NS an SS” 

=#- 23+), (6 


From (5) and (4), the expected value of the squared S.E. Meas. is 


Lt - ¢ - 9). (7) 





n—-1 


E{si(1 — 1)] = al 








8 PSYCHOMETRIKA 
In order to deal with (7) we first need expressions for E(s?) and E(2)?: 
Be) = 4 Da. - 9°] 
= z| 4 El. - 1) + —  - E- a] (8) 

After squaring and rearranging E and >, signs, 
BG) = y LL Et — 1°} + BLL (re — 9") + NEUE - 9°} 

+2 Di (te — AE — 1.)} — 2B{(E— 7) Dt — 1) 

— 2E{(i — 7) > (7, — 7)}]. (9) 


Now the fourth and the last terms on the right vanish since E(t, — 7,) and 
>. (+. — 7) both equal zero. It is seen that we have, term for term, 


E(s?) = ¥ > var (t,) + 7 + var (2) + 0 — 2 var (2) — 0. (10) 
Now var (é,) is given by (2), so that 


E(si) = Ta(n — T.) + o; — var (2). (11) 


* & 
nn 
Finally, proceeding as in (6), we have 


1 n—1 
n 


E(s)) = 7 —- " 7 + a, — var (2). (12) 





Next, by the definition of var (2), 
E(#?) = var (2) + 7’. (13) 
From (7), (12), and (13), 
E[si(1 — r21)] 





= to [ne var) — ett et 14 ward 


n— 1 


7”, (14) 


7 


2 
Oo; 


3 il 
zie 


This result is the same as that in (6). We have shown that the average squared 
standard error of measurement found in type-2 sampling is exactly equal to 
the expected value of the squared S.E. Meas. derived from the formula-21 
Kuder-Richardson reliability coefficient. 




















FREDERIC M. LORD 9 


The logical relation between Kuder-Richardson formulas 20 and 21 
can be derived from (1) and (5), from which it is readily found that 





2 
si(1 — Yo) = 8{(1 — tn) — “8 . (15) 
n-1 

Now the term on the left and the first term on the right of (15) are the squared 
standard errors of measurement computed from 725 and from r,, , respectively. 
Furthermore, since ns;/(n — 1) is the unbiased small-sample estimate of 
the population variance o; , it is seen that the last term on the right is the 
small-sample estimator for the squared standard error of the mean score 
[see (22)]. Consequently, we may rewrite (15) as 


(S.E.Meas...)” = (S.E.Meas..,)” — S.E.7(2). (16) 


The difference between r2. and r2, , aS made apparent in (16), arises 
from the fact that some randomly parallel forms are, by chance, composed 
of harder-than-average items, or of easier-than-average items; consequently, 
the mean of the actual scores on any given test is not exactly equal to the 
mean of the true scores for the same examinees. The use of 120 1s appropriate 
whenever one is willing to ignore any difference between the mean test score of the 
group and their mean true score, t.e., when one is concerned only with the relative 
rather than the absolute size of the scores of the group. On the other hand, ra 
should be used whenever one ts concerned with the actual magnitude of the errors 
of measurement, e.g., whenever there is a predetermined cutting score which 
divides the examinees into passing and failing groups. 

The foregoing treatment brings to our attention the very important 
fact that S.E.(¢,) 7s actually the same as the traditional standard error of measure- 
ment of the individual examinee’s score. The first formula in the second column 
of Table 1 thus provides a very simple way of computing this important 
quantity. 


5. Comparison with Certain Standard Formulas 


A formula closely related to (4) is the following, adapted from (66) of 
reference (8), which will appear familiar to most readers: 








8; — ‘ 

S.E.(2) VN V1 — reliability. (66’) 

The question arises as to why S.E.(2) in (66’) has a totally different 
formula from that given in Table 1 for the type-2 standard error of the 
mean. If we use (66’) to determine whether or not two forms of a test yield 
significantly different mean scores, we will always find the difference to be 
significant provided only that we take a sufficiently large number of examinees 
(NV) for our experiment. This is true because the standard error in (66’) is 
inversely proportional to ~V N—the standard error vanishes when N is large. 











10 PSYCHOMETRIKA 


(66’) represents the sampling fluctuations of the mean that would be observed 
if the same test were administered to successive samples of N examinees so 
chosen that the distribution of true scores was the same in each sample. If 
the same test is administered twice to the same group of examinees, (66’) 
could be used in investigating the significance of the difference between the 
mean scores obtained on the two testings, provided it can be assumed that 
there is no practice effect. In this case, there is only one test involved, and 
there is thus no sampling of test items. Obviously, (66’) should not be used 
when there is sampling of items—a type-2 standard error is required. 

Consider next Wilks’ (11) and Votaw’s (10) procedures when either of 
these is used as a criterion of ‘‘parallelism”’ in tests, as suggested by Gulliksen 
(3, Ch. 14). Gulliksen defines “‘parallel’’ tests as having equal means, equal 
variances, and equal intercorrelations with each other and with all external 
criteria (as well as satisfying appropriate non-statistical criteria of parallelism). 
Wilks’ and Votaw’s significance tests provide rigorous statistical criteria for 
“parallelism” under this definition. They could appropriately be applied if 
identically the same tests were administered twice to the same examinees, 
provided it could be assumed that no practice effect had occurred. It would 
not be very desirable, however, to apply Wilks’ or Votaw’s procedures to 
data such as were obtained in the second illustrative example given in section 
3. If a test composed of items having a certain characteristic is to be compared 
with a test composed of different items having a second characteristic, it 
may not be very useful to set up the null hypothesis that the two tests are 
strictly interchangeable in every way. Such a null hypothesis will always be 
rejected if N is sufficiently large, but the rejection of this hypothesis does not 
necessarily imply that the first and second characteristics have different 
effect, since the observed discrepancy might be readily accounted for as no 
greater than would be expected to be found in comparing two randomly 
parallel tests composed of the same kind of items. 


6. Sampling Distributions of Test Statistics 


It remains only to present the derivations of the results that have up 
to now been quoted without proof. The derivations are based on the assertion 
that there is a definite response (x,;,) that a given examinee will make to a 
given item. The nature of this response may or may not be known in advance. 
The group of N examinees to whom the items or tests are administered is a 
fixed group not subject to sampling fluctuation or other changes. 

The responses of the N examinees to item 7 may be specified by the 
column vector {x, = 2;; ,Z:2, -** , X:v}. Since each item response is assumed 
to be treated as either “‘right’”’ or “wrong,” x;, = 0 or 1, and there are exactly 
2” possible different vectors, i.e., different patterns of item response. If we 
let the subscript J = 1, 2,3, --- , 2", then these possible patterns are repre- 
sented by the 2” vectors x; . If two items have exactly the same pattern of 














FREDERIC M. LORD 11 


responses, i.e., if the response of each examinee is the same on both items, 
then the two items are wholly indistinguishable in the present situation. It 
may therefore be asserted without loss of generality that, for present purposes, 
any infinite pool of items is composed of 2” different kinds of items, designated 
by the 2” vectors x; . The relative frequencies of occurrence of the different 
kinds of items are therefore the only parameters needed to describe com- 
pletely any infinite pool; these parameters will be denoted by 7 , the relative 
frequencies of occurrence of the patterns 2; . 

When a random sample of n test items is drawn from the pool, the 
probability that the resulting n-item test will be composed of n, items of 
the first kind, n, items of the second kind, --- , n; items of the /th kind, --- , 
Nox) items of the 2’th kind is given by the standard multinomial distribution 
(7, pp. 58-59): 

fl May +*> Ram) = a Tae. (17) 
’ ’ ? I] ny! Ps 

It can be shown (1, p. 419) that the quantities V; = (n; — nx;)/V ne, 
are asymptotically normally distributed for large n with zero means and 
with the (singular) variance-covariance matrix I — mz’, where J is the 
identity matrix and 7 is the column vector (Wm, Vm, °°: , Waa). 
Now, the test score of individual a is 2, = >>; 2:./n = >-1 Xr.Nz/n, the 
£;, being given constants, 0 or 1, not subject to sampling fluctuation; or, in 
terms of V; , 


a D> 12X10 + lee a Va: L1oV1 ° 
I Vn z 


The first term on the right is £&, = 7,/n, the “true’”’ proportion-correct score; 
so that, finally, 


V7 (@a — £2) = » Vo1 1eV1 


It is thus seen that the N variables Vn (2. — £.) are asymptotically jointly 
multinormally distributed, each with a mean of zero, a variance which turns 
out to be ¢, (1 — £.), and covariances {,, — {.¢, , where ¢,, is the proportion 
of all items answered correctly by both examinee a and examinee b. It follows 
immediately that the large-sample standard error of z, is Vr a(l — ¢£,)/n 
[ef. (2)]. The derivation of these and other standard errors will be left to the 
following section. 

By a well-known theorem, if f(z, , 22, --* , 2.) is a function of the z, 
having continuous first-order partial derivatives with respect to each z, at 
the point (¢, , 2, °** , ¢w), and if at least one of these derivatives is non- 
vanishing at this point, then the quantity 


Vn [f(a »22,°°° » 2) — f(t » fey *** , ty)] 








12 PSYCHOMETRIKA 


is asymptotically normally distributed with zero mean when n is sufficiently 
large. This theorem assures us that the mean score (2 or t), the standard deviation 
of the scores (s, or s,), the Kuder-Richardson formula-21 reliability (r2,), and 
the test validity (r., or r..), are approximately normally distributed in type-2 
sampling with large n; and in addition gives us the large-sample expected 
value of each statistic. It seems highly likely that the Kuder-Richardson 
reliability, formula 20, likewise is asymptotically normally distributed, 
but no proof of this conclusion is available at present, in view of the fact that 
the formula for this statistic involves o°(p), which is not a function of the z, . 

The foregoing proof of asymptotic normality follows a line of reasoning 
that would require n to be very large except when N is very small, viz. N = 2. 
The nature of the situation, however, gives excellent reason to suppose that 
normality is approximated more quickly than the line of proof suggests when 
N > 2. No rigorous proof of this fact has been found. 


7. Derivations of Expected Values and Standard Errors 


The Individual Score 


The proportion of the items in the entire pool to which examinee a will 
give the correct answer is, by definition, ¢, = 7,/n. If we concern ourselves 
with only a single examinee, the number of correct responses that he gives on 
one sample of items is not correlated with the number that he gives on other 
samples. If n items are drawn at random from the pool, t, , the score of 
examinee a on the resulting test, i.e., the number of items that he will answer 
successfully, will of necessity have the usual binomial distribution with 


mean and variance 


E(t.) ce! (18) 
S.E.%t,) = 7 n(n — 7) = ntl — $0). (19) 


This conclusion (and also those that follow, except as large n may be assumed) 
depends on no assumptions whatever except that of random sampling. (19) is 
identical with (2), which was discussed in a previous section. If the observed 
value t, is substituted for the unknown 7, in (19), we obtain the square of the 
first formula of Table 1. 

For finite sampling, when n items are drawn without replacement from 
a finite pool of m items, the corresponding formulas, stated without proof, are 


E(t.) = 1. , (18’) 





S.E.°(t,) = “es the «gd (19’) 

















FREDERIC M. LORD 13 


The Mean Score of the Group Tested 


It should be noted that the scores of examinees a and b are not in- 
dependent over different parallel forms of the test. If a particular form happens 
to be composed of rather difficult items, both examinees will tend to get low 
scores; if a particular form happens to be easy, both will tend to score higher. 
Consequently, although the expected value of the mean score in the group is 
equal to the mean of the expected values of the individual scores, i.e., 


B) = 5 on =F, (20) 


the standard error of the mean is not an average of the standard errors of the 
individual scores. 

It will be convenient from this point on to work with z, = t,/n, the 
proportion-correct score, rather than with ¢, itself. The nature of the desired 
standard error follows immediately from the fact that the mean score (2) is 
identically equal to the average item difficulty 


z= M(p). (21) 


The usual formulas for the standard error of a mean apply to M(p), so that 
S.E.@) = + op), (22) 


where o(p) is the standard deviation of the item difficulties over the whole 
pool of items. If the observed value of s*(p) is substituted for the unknown 
o’(p), and if ¢/n is substituted for z, the square of the second formula of 
Table 1 is obtained. [(19) is a special case of (22), being obtained when 
Di = Xia] 

In sampling from a finite pool of m items, the corresponding formula, 
stated without proof, is 


ae Fe 
mn 





S.E.7(2) = a’ (p). (22’) 

We may note that o(p) for a given set of items, and hence S.E.,(2) for a 
given test, will be higher when N is small than when N is large. Suppose, for 
example, that all items have the same difficulty (p) for a very large group of 
examinees, so that for this group o(p) = 0. If the same items are administered 
to a smaller group of examinees drawn at random from the larger, the observed 
values of p; in the smaller group will differ from each other because of type-1 
sampling fluctuations, and o(p) will be greater than zero. In the extreme case 
where N = 1, the observed values of p are of necessity either 0 or 1, and 
o(p) is at a maximum. 











14 PSYCHOMETRIKA 


The Standard Deviation of the Scores of the Group Tested 
In order to obtain the standard error of s° , we first use the formula for 
the variance of a sum to write 


g=4.>5 Dana, (23) 


nh 


s;, being the covariance between item 7 and item h. Then, again from the 
formula for the variance of a sum, 


1 ~ . 
var (s;) = ms > Ss “3 / 
. 7 


where ‘“‘cov” stands for the sampling covariance 


cov (Si, ’ Six); (24) 


7 


CoV (8;, , 8.) = Es;.8;, — Es;,Es;, . 


Grouping the sums in (24), we obtain 


1 (n*—6n? +11n?2—6n) (n?—3n2+2n) 
2 
var s = qT es COV (S,; ’ $5) + 2 Zz. COV (s; ; Six) 
n (hy isi xk) (itixk) 
(n* —3n? +2n) (n?—n) 
2 
+4 ms cov (8;; , Sj) + 4 a cov (8; , 8;;) 
(i¥i¥k) (#7) 
+ other sums containing no more than n’ terms each |. (25) 


Here the first sum is over all sets of four subscripts no two of which are the 
same, etc. The coefficient 2 of the second sum arises from combining the two 
equivalent expressions : cov(s; , 8;,) and : cov(s,; , 8;). The 
other numerical coefficients arise similarly. The polynomials in n written 
above the summation signs indicate the number of terms involved in the 


summation. 
Now, the terms under each set of summation signs in (25) ave all the 


same no matter what the numerical values of the subscripts; consequently 
var 8s; = 4 [(n* — 6n® + 11n” — 6n) cov (8; , 8:4) 


+ 2(n* — 3n? + 2n) cov (8; , 8;+) 
+ 4(n® — 3n* + 2n) cov (8;; , 84) + O(n’)], (26) 


where O(n’) stands for terms of order n”. In (26) and in the following paragraph 
it is understood thath # 1 ¥ 7 # k. 

Now, s,; and s,, fluctuate independently over successive samples, so 
that cov(s,; , 8;,) = 0. The same is true of s{ and s,;, . Consequently, 














FREDERIC M. LORD 15 


4 
var s. = pe n* — 3n” + 2n) cov (s;; , 8;x) 


of) fortuna +04). 09 


Equation 27 gives the desired result, but not in a very useful form, since 
cov(s;; , 8;,) is a function of population parameters and is generally not 
known. As a final step, then, it will be shown that s’(s;.), the actual variance 
(over items 1 to n) of the observed item-test covariances, provides a consistent 
estimate of cov(s;; , $;,); it will be proved that 


E{s*(s;.)] = cov (s;; Sin) + o(*). (28) 
From the formula for the covariance of a sum, 
1 ' 
a “ Xe 8s ; (29) 
, 1 
s°(s;,) = ne i > 8(8;; ’ Six), (30) 


the term under the summation sign being the actual covariance (over items 
1 to n) of the observed values of s,;; and s;, : 


il 1 
8(8;; , Six) = * te 85j8izk — ne o> 8D Six). (31) 
Substituting from (31) into (30), and taking expected values, we find 
1 1 ‘ c 
E{s*(s;,)] = n 2 i 2 H(s.:8) ad n* > i } 2 E(6r,8i)- (32) 


Grouping the sums on the right, we have 


n(n—1) (n—2) 


Efs*s..)] = 4] DBs) + ov 


(isink) 


1 n(n—1) (n—2) (n—3) : 
ee l P2 E(8,;8:%) + ow) | (33) 
(h¥ ijk) 

Now, the terms under each summation sign in (33) are the same regard- 
less of the numerical value of the subscript. Furthermore, as already pointed 
out in deriving (27), cov(s,; , 8.) = 0 whenh # i ¥ j # k, or in other words, 
E(8;8i) — E(8,;Esx) = 0, or E(s,;8:) = E(s;;Es,;,). Consequently, 

1 
n 


E{s*(s;,)] = Ks, 8%) — E(s;;)E(8iu) + of ) (34) 


But this is the same as (28), which was to be proved. 











16 PSYCHOMETRIKA 


The large sample standard error of s; may therefore be estimated from 
the actual variance of the observed item-test covariances: 


SE.) = 2 e%,.). (35) 

By means of the “delta”? method (5, Vol. 1, pp. 208 ff.), it is readily 
shown from (35) that in large samples 

s°(s;.) 


iz 
ee 
ns, 





S.E%(s,) = ASE) “ (36) 


If ¢/n is substituted for z in (36), the square of the third equation of 
Table 1 is obtained. 

The corresponding squared standard error for sampling from finite 
populations may be shown to be 


S.E.(s) = 4 Pane). (37) 


The Kuder-Richardson Reliability Coefficient, Formula 20 


Let the usual formula for rz. , the Kuder-Richardson formula-20 co- 
efficient, be rewritten as follows: 


n R 
Pa (1 fi "), . _ 
where 
R= : ¥ si/s; = M/s; , say. 
In the extraordinary case where s? = 0, we will agree not to try to compute 
any value of r.. . The “delta’”’ method may now be used to obtain the result 


a on (Ml, @). (39) 


6 
8, 





a M? . 
var R = =i var M+ “i Var s, — 
z z 


Now var(s:) is already known from (35). Var(M) can be evaluated by the 
usual formula for the standard error of a mean: 


1 
var M = a s°(s'), (40) 
where s’(s;) is the actual variance of the observed item variances. Finally, 
it is readily shown, by methods similar to those used in evaluating var(s?), 


that 


cov (M, 83) = = 9(6t ,8..), (41) 














FREDERIC M. LORD 17 


where s(s; , 3;,) is the actual covariance between the observed item variances 
and the observed item-test covariances. Consequently, 


var R = 1, [e(@) + 4R’s(6,.) — 4Ra( , 6.) (42) 
Now var(reo) = var(R)/n’; hence, to order 1/n‘, 
S.E.7(r20) = a [s°(s3) + 4n?(1 — 120)"8°(8;,) — 4n(1 — roo)s(s: , 8;.)]. (48) 


It may be noted that. the quantity (1 — ro) is of order 1/n, because lim,.. 
n(1 — roo) = constant. It is then seen from (43) that S.E.?(ra) is a quantity 
of order 1/n*. Equation 43 leads directly to the fourth formula of Table 1. 

It may be shown that the corresponding standard error when sampling 
from a finite population is (m — n)/m times the value given in (48). 


The Kuder-Richardson Reliability Coefficient, Formula 21 


By a procedure wholly parallel to that used for the formula-20 reliability 
coefficient, it is found that, approximately, 


S.E.%r.) = ae [(1 — 28)°s(p) + 4n2(1 — 1r21)°8*(8,.) 
— 4n(1 — 1r2,)(1 — 22)s(p; itll, (44) 


where s(p; , s;,) is the actual covariance between the observed item difficulties 
and the observed item-test covariances. Equation (44) leads directly to the 
fifth formula of Table 1. 

The standard error of the split-half reliability coefficient has not been 
worked out. It must, however, be larger than the standard error of ra , given 
by (43), since > is the mean of the split-half coefficients from all possible 
splits, as shown by Cronbach (2). 


The Validity Coefficient 
If c is an outside criterion, 





es = _ (45) 
By the “delta” method, 
vara =e) ae + MEE — Sota) 
It is found that 
var 8., = 5 8°(s,;); (47) 


cov (s;, , 83) = 2 see » 8:,). (48) 








18 PSYCHOMETRIKA 


Finally, 


, ck. ae Qr.. a 
S.E.7(r.,) = Ps E s°(s.;) — = 8(6ic » 8:2) + = e.) | (49) 


z 


Equation (49) leads directly to the last formula of Table 1. 
The corresponding standard error for sampling from a finite pool of 
items is presumably (m — n)/m times the foregoing quantity. 


8. Simultaneous Sampling of Items and Examinees 


Simultaneous and independent sampling of items and examinees might 
be called matrix sampling instead of type-12 sampling. [A generalized approach 
to this problem is reported in (4).] Here, both the population and the sample 
may be thought of as matrices. Each row of the population matrix may be 
taken as representing one test item, and each column as representing one 
examinee. The elements of the matrix are taken to be 1’s and 0’s, depending 
upon whether or not the examinee would answer the item correctly if it were 
administered to him. The actual responses given by a random sample of 
examinees to a test consisting of a random sample of items can be thought 
of as constituting a rectangular matrix composed of n rows and N columns 
selected independently and at random from the population matrix. 

Let y be any statistic calculated from the sample matrix. Consider all 
possible n X N matrices that can be formed from the population matrix by 
a process of omitting entire rows and columns. Var,, y, the type-12 sampling 
variance of y, is, by definition, equal to the variance of the y values calculated 
from all possible such n X N matrices, i.e., 


vary, y = E,(y — Ey)’, (50) 


where £,, indicates that the expectation of the directly following quantity 
is to be taken over all possible n X N matrices. (The convention of always 
following each expectation symbol with parentheses or brackets will be 
dropped.) 

Equation (50) may be made more convenient by application of a very 
familiar lemma from analysis of variance, which states that the “total sum 
of squares” is equal to the ‘‘within sum of squares’’ plus the “among sum of 
squares.” It is immediately found that 


vary, Gees E,(E2.(y = E,.,y)”] + E\(Eo.y e E,2y)’, (51) 


where £,., is the conditional expected value over all possible combinations 
of rows of the population matrix, the columns being held fixed, and £, is the 
expected value over all possible combinations of columns. In more concise 
notation, (51) becomes 


var,. y = E,(var.., y) + var, (F2.1y,) (52) 

















FREDERIC M. LORD 19 


. where var,., and var, are type-2 and type-1 sampling variances, respectively. 
By symmetry, there is also the alternative equation 


varie y = E,(vari.2 y) + vars (Ei.2y). (53) 


If y is a consistent statistic in type-2 sampling, E,.,y will not differ 
greatly from y in large samples. This fact suggests that it will often be found 
in large samples that 


var, (E2.1y) = var, y. (54) 


Similarly 
E,(var2. y) = vare y. (55) 


If (54) and (55) hold to a satisfactory order of approximation, then (52) 
reduces to the very simple result that the type-12 sampling variance is ap- 
proximately equal to the sum of the type-1 and type-2 sampling variances 


vari. y = var, y + var, y. (56) 


A similar statement can be made for (53). 

The simple result represented by (56) can be shown 'to hold in the case 
of the mean score, Z or ?, and, at least under the assumption that the scores 
are normally distributed, in the case of the standard deviation, s, or s, . 
Proofs are presented in the following two sections. . 

The type-1 sampling variances of the Kuder-Richardson reliability 
coefficients are not known to the writer. Since the type-2 sampling variances 
of r29 and.r., are of order 1/n* [see (43) and (44)], it seems clear that the 
type-12 sampling variances of these coefficients, to our order of approxima- 
tion, depend only on the unknown type-1 sampling variances. Neither these 
nor the type-12 sampling variances of the reliability or validity coefficients 
have been worked out. 


The Mean Score 
In the case of the mean relative test score, from (20), (22), and (52), 


var, @ = * Be") + vat; £, (57) 


where o’(p) is the variance over all items in the population of the values of 
p; for a given group of examinees, and ¢ = a ¢./N . 
According’ to the standard formula for the standard error of any mean, 
the last term of (57) is 
var, ¢ = 5 o ; (58) 
where o; is the standard deviation of ¢, over the entire population of ex- 
aminees. 











20 PSYCHOMETRIKA 
Next, it will be helpful to evaluate the first term on the left of (57) 
1 1 - 1 _ 
7 E102.1() = 7 Ei(Bo..pi = °) = n (E,E,p; — E,§). (59) 


Now the difference E,p; — x; , where 7; = E,p; , is by definition var, p, , 
the usual binomial variance known to equal z;(1 — 2;)/N. Hence, 


Eg? = So 1 4%. (60) 
Similarly, 
2 1 2 72 , 
Eig =yurtdZ, (61) 


where Z = E, f. = Ear; = Ey.2, = Eyp; = EX, is the over-all population 
mean. From (60), 





E.Egt = N= By + x7. (62) 


As before, 

Ew = 03 +2, (63) 
where o; is the standard deviation of the values of x; over the entire popula- 
tion of items. From (62) and (63), 


N —1 
N 


The substitution first of (58) and (59), then of (61) and (64) into (57) 
gives finally 


N-1la,1s 
+7 +52. (64) 








E.E yp; = 


var,; (2) = [m — Dor + (N — Do? + Zl — D. (65) 


Equation (65) gives the exact, small-sample sampling variance of z. The same 


result can be obtained from (53), as a check. 
When terms of order 1/n’ and of order 1/N’ are neglected in (65) the 
large-sample type-12 sampling variance is found to be approximately 


1 2 ‘ 
vars (@) = = of + 5 a2. (66) 


Since oj = E,8; {see (70)], it is easily seen that to our order of approxi- 
mation 


S.E.2,(2) = ae tis (67) 














FREDERIC M. LORD 21 


We thus have the simple result that the type-12 sampling variance of the 
mean test score is equal to the sum of the type-1 and the type-2 sampling 
variances approximately. 


The Standard Deviation of Scores 
In the case of s? we find by dividing (12) by n” that 





E.8; = st 9 += - : or — var Z; (68) 


or, dropping terms of order 1/n, 
Es; a oF ’ (69) 
as might be expected. 
Also, a standard formula gives the result 


var, 8? = + (ude) — of], (70) 


where y,(z) and o% are the fourth and the squared second moments of the 
distribution of the scores of all examinees. If we are willing, for the sake of 
simplicity, to assume that these scores are effectually normally distributed, 
then 


var, s; = 2 > (71) 


N 
From (71), (69), and (53), approximately, 


var}. 8, = 2 E.o1 + var, 0 . (72) 


N 
From (72), (69), and (35), 
2 4 
Vari 8; = v of + = s°(o;,), (73) 


where o; is the standard deviation of all true scores and s’(c;,) is the variance 
over n items of the true item-test covariances computed using all examinees 
in the population. To our order of approximation, 


2 4 
S.E.1.(8;) = 77 8: +> 8%s.,). (74) 
Under the assumption that z is effectually normally distributed, it is thus 


found that the type-12 sampling variance of s? is approximately equal to the 
sum of the type-1 and type-2 sampling variances. 








22 





PSYCHOMETRIKA 


REFERENCES 


. Cramér, H. Mathematical methods of statistics. Princeton Univ. Press, 1946. 


2. Cronbach, L. J. Coefficient alpha and the internal structure of tests. Psychometrika, 


~J 


1951, 16, 297-334. 


. Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 
. Hooke, R. Sampling from a matrix, with applications to the theory of testing. Statistical 


Research Group, Princeton University. Memorandum Report 53, 1953. (Dittoed.) 


. Kendall, M. G. The advanced theory of statistics. London: Charles Griffin and Co., 


1948. 2 vols. 


. Kuder, G. F. and Richardson, M. W. The theory of the estimation of test reliability. 


Psychometrika, 1937, 2, 151-160. 


. Mood, A. M. Introduction to the theory of statistics. New York: McGraw-Hill, 1950. 


8. Peters, C. C. and Van Voorhis, W. R. Statistical procedures and their mathematical 


10. 


11. 


bases. New York: McGraw-Hill, 1940. 


. Tucker, L. R. A note on the estimation of test reliability by the Kuder-Richardson 


formula (20). Psychometrika, 1949, 14, 117-119. 

Votaw, D. F., Jr. Testing compound symmetry in a normal multivariate distribution. 
Ann. math. Stat., 1948, 19, 447-473. 

Wilks, S. S. Sample criteria for testing equality of means, equality of variances, and 
equality of covariances in a normal multivariate distribution. Ann. math. Stat., 1946, 
17, 257-281. 


Manuscript received 3/1/54 


Revised manuscript received 5/11/54 











PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


SEPARATION OF DATA AS A PRINCIPLE IN FACTOR ANALYSIS 


CuEsTeR W. Harris 


UNIVERSITY OF WISCONSIN 


Two systems of factor analysis—factoring correlations with units in 
the diagonal cells and factoring correlations with communalities in the 
diagonal cells—are considered in relation to the commonly used statistical 
procedure of separating a set of data (scores) into two or more parts. It is 
shown that both systems of factor analysis imply the separation of the 
observed data into two orthogonal parts. The matrices used to achieve the 
separation differ for the two systems of factor analysis. 


One of the recurring operations in statistical work is that of separating 
data into parts. Probably the most common example of this is that of separat- 
ing the raw score for each of a number of subjects into a deviation score plus 
the mean of the scores for these subjects. The analysis of variance offers 
examples of this practice, since this method of analysis in effect further 
separates such deviation scores into two or more parts, depending upon the 
complexity of the design. Similarly, linear regression theory postulates a 
separation of the data of the dependent variable into two parts and provides 
a method of calculating each. It is at least intuitively evident that factor 
analysis also implies a separation of data into parts; however, the particular 
characteristics of the principle followed in making the separation may not be 
well understood. The purpose of this discussion is to interpret the procedures 
of factor analysis from this point of view. 

Cochran’s theorem (3), particularly Cramér’s discussion of it (4, pp. 
116-18), shows the necessary and sufficient conditions for decomposing a 
sum of squares into orthogonal parts. Consider only the matrices of the 
quadratic forms. For such a decomposition, these matrices satisfy the equation 


I=A,+ A,.+°::+An, (1) 


with the A; symmetric idempotent matrices that are pairwise orthogonal 
and whose ranks sum to the rank of J. Idempotent matrices are singular 
matrices such that A = A’; they may be viewed as being generated by 
incomplete orthogonal matrices, i.e., sets of orthogonal columns. Thus (1) 
implies an orthogonal transformation. ‘he most familiar example of this is 
the decomposition 


> 2 =n’ + D(a; — 8’, 
23 








24 PSYCHOMETRIKA 


where the summation is over n measures. This might be written 
XIX’ = XA,X' + XA,X’ (2) 


with X designating a single row vector of data and the A, and A, properly 
defined symmetric idempotent matrices. In this case the transformation is 
accomplished by an orthogonal matrix with each element of the first column 
consisting of the positive reciprocal of the square root of n. The matrix A, 
therefore is square, of order n, with each element the positive reciprocal of 
n. A, is square and symmetric, with each diagonal element equal to (n — 1)/n, 
and each off-diagonal element equal to the negative reciprocal of n. Aitken 
(1) demonstrates the independence of these forms for samples of normally 
distributed variables. The analysis of variance for a single variable implies 
the decomposition of the matrix A, in (2) into two or more pairwise orthogonal 
idempotent matrices. For the simplest design, A, is separated into two parts, 
one associated with the notion of “variance between” and the other with the 
notion of ‘variance within.” The further separation of the matrix associated 
with “variance between” implies more complicated designs, such as a factorial 
arrangement of groups. Aitken’s paper also gives necessary and sufficient 
conditions for the independence of two quadratic forms. 

Since the symmetric idempotent matrices detailed in (1) are pairwise 
orthogonal, i.e, A;A; = A;A,; = 0, it follows that for any matrix X, 
(XA;) (XA;)’ = (XA;) (XA;)’ = 0. This is a function of the matrices of the 
quadratic forms, and does not imply a particular distribution for the popu- 
lation from which X is drawn. Cochran’s theorem shows that the sampling 
distribution of terms such as those on the right of (2) is known if X is a 
random sample from a normal population (univariate case). Bartlett (2) has 
discussed tests of significance for a decomposition of the form 


».¢ = XA; aa XA; ’ (3) 


where X is a sample from a multi-variate normal population with mean zero. 
Here, A; and A; are two parts of the matrix A, as it was defined for equation 
(2). Two points have been emphasized. One is that choosing symmetric 
idempotent matrices that are pairwise orthogonal as the matrices of the 
quadratic forms gives a decomposition, as in (3), for which (XA;) (XA;)’ = 
(XA,) (XA,)’ = 0. Second, under certain assumptions regarding the nature 
of the data, sampling distributions of statistics derived from the parts given 
by such a decomposition are known. The remainder of the paper will consist 
of a discussion of the principle given by this first point in relation to factor 
analysis. 

The following geometric representation of factor analysis is well known. 
For convenience, it will be assumed that the data have been scaled to unit 
variance; this may be a critical assumption from the statistical point of view, 
as Rao (8) shows, and the making of it emphasizes the attempt in this paper 














CHESTER W. HARRIS 25 


to describe factor analysis rather generally and not to treat the complicated 
inferential problems. It is possible to regard the n persons as defining a 
space within which are located the k tests. The person axes are assumed to 
constitute a rectangular Cartesian system. Then the n measures for a given 
variable, when put in deviation form and scaled to unit variance, are the 
coordinates of a point in this person space that, when joined to the origin, 
defines the variable or test as a vector. Any factor may also be viewed as a 
unit-length vector located in this person space; such a factor is uniquely 
located by a set of n coordinates defining its end-point or by a set of direction 
cosines with respect to the n person axes. Define Z as a k X n matrix of data, 
such that ZZ’ = R, . The matrix of intercorrelations with units in the 
diagonals is designated by R, and Z’ is the conventional transpose of Z. 
Let y be a column of direction cosines locating a single factor in the person 
space. Then Zy is the column of k scalar products of variables with this 
factor; these are correlation coefficients here, and may be regarded as a column 
of the factor matrix. Finally, Z(yy’) gives the coordinates, with respect to 
the person axes, of the perpendicular projections onto the factor axis of the 
points representing the variables. This expression Z(yy’) also is the portion 
of the data, Z, that is accounted for by the first factor. In other words, 


Z = Zyy’) + ZI — yy’) (4) 


describes the separation of Z into two parts, one of which is associated with 
the factor that is located in the person space by the column of direction 
cosines, y, and the other part a remainder. 

Equation (4) necessarily represents a separation of Z into two orthogonal 
parts. This is true because yy’ is idempotent, i.e., yy’yy’ = yy’, and con- 
sequently J — yy’ also is idempotent; therefore yy’ (I — yy’) = (UI — yy’)yy’ 
= 0. It also is true that the matrix Z(yy’) is the least-squares approximation 
of the row yz’ to the rows of Z. This follows from least-squares theory; for a 
summary of the role of symmetrical idempotent matrices in generating least 
squares approximations see Harris (6). In general, then, the specification of 
y, i.e., the direction cosines of a single factor, provides a separation of the 
data Z into two orthogonal parts, one of which is the least-squares approxi- 
mation of y’ to Z. The only requirement imposed upon y has been that it 
designate a factor axis in the person space; in other words, y has been chosen 
arbitrarily from the indefinitely many possible unit-length vectors in the 
person space. 

Equation (4) might be written more generally as 


Z=ZA+H2I— 4), (5) 


where A designates a symmetrical idempotent matrix, i.e... A = A’. Every 
symmetrical idempotent matrix may be viewed as the product YY’, where Y 
is a set of orthogonal columns, i.e., an incomplete orthogonal matrix. (If Y 








26 PSYCHOMETRIKA 


is the complete orthogonal matrix, A = J, of course.) Equation (5) then is 
the case of selecting one or more mutally orthogonal unit-length axes in the 
person space as a set of factors; the direction cosines of these factors are given 
by Y. As before, the two parts on the right of (5) are uncorrelated and ZA 
is the least-squares approximation of Y’ to Z. The matrix Y’ is, of course, 
also regarded as the set of uncorrelated factor scores, each with unit variance. 
Again it should be emphasized that Y is arbitrary and might designate any 
set of orthogonal axes in the person space. 

So far, then, it has been shown that the specification of one or more 
factors leads to the separation of Z into two orthogonal parts, one of which 
is a particular least-squares approximation. 

The final step is to consider two approaches to factor analysis that differ 
primarily in the way in which the factor space is defined. The nature of these 
two approaches can be illustrated by considering the correlation between 
two variables. If the two variables are viewed as two unit-length vectors 
located in the person space, then the variable space is (at most) a plane, i.e., 
of dimension two. It is possible to define the factor space as identical with the 
variable space; this definition corresponds to choosing to factor the unit 
variances and the intercorrelations of the variables. If the complete inter- 
correlation matrix is non-singular, two factors may be extracted by this 
procedi re. They would, necessarily, define the variable space. If only one 
factor is extracted, it would be represented by a line embedded in this variable 
space. Spearman’s approach to this problem differs. His approach defines the 
common-factor space as the line formed by the intersection of two orthogonal 
planes, in each of which lies one of the unit-length vectors. The uncorrelated 
unique factors are defined by lines perpendicular to this single common- 
factor axis and, of course, also lie in these two intersecting planes. For two 
variables that are not correlated perfectly, i.e., are not collinear, the Spearman 
approach necessarily defines the common-factor space as distinct from the 
variable space. This definition corresponds to choosing to factor communali- 
ties and intercorrelations, rather than the correlation matrix with units in 
the diagonals. 

The first approach, i.e., factoring R, , requires that any factor axis be 
embedded in the variable space; as a result, the factor might be located by 
reference to the k axes of the Cartesian system provided by the test vectors, 
as well as by reference to the n person axes. Obviously, the test vectors need 
not form a rectangular reference system. This means, then, that for any such 
factor there is some set of weights that, when applied to the variables, gives 
a linear combination of the entries in Z that reproduce the set of factor 
scores. Holzinger (7) gives illustrations of this principle, using both the 
centroid and what has since become known as the multiple-group methods 
of factor analysis. Using this approach, it may be pertinent to determine a 
“best” location of a factor. Eckart and Young (5) have shown the nature of 


\ 

















CHESTER W. HARRIS 27 


the best approximation, in a least-squares sense, of a matrix of data, Z, by 
another matrix of specified lower rank. Securing this best approximation 
is equivalent to identifying the first r principal-axis factors of R, , where 
r is the specified rank that is lower than the rank of Z. Defining the total 
factor space as identical with the total test space and then extracting r 
principal-axis factors from R, gives a separation of Z into the two parts of 
(5) such that ZI — A)Z’ has a minimum trace compared with its trace for 
any other definition of A. That A is well-defined is evident from noting that 
A is generated by the r unit-length characteristic vectors of Z’Z that corre- 
spond to the r largest characteristic roots of ZZ’, which necessarily are the 
same as those of Z’Z. The Eckart and Young results therefore show that 
their choice of Y gives a matrix ZA which is not only the least-squares approxi- 
mation of Y’ to Z, as it must be when A is generated by Y, but also a best 
approximation to Z. 

Finally, it is evident that the communality principle in factor analysis 
also postulates an equation of the form of (5), since the common factors are 
defined by some set of direction cosines, Y. However, the common factors 
are not embedded in the variable space and consequently the elements of Y 
cannot be calculated from the data, Z. Thomson (9, p. 78) comments on this 
point. This means, then, that when equation (5) is used to describe the com- 
munality principle in factor analysis it must be regarded as a formal equation 
with A = YY’ unknown. If A were known, then a principal-axis resolution 
of ZA into factors and factor scores, 


ZA = FS’, (6) 


would lead to the definition of A as SS’. This would follow from noting that 
SS’ is a unit for multiplication on the right of ZA that is of the same rank 
as A and recalling that a multiplication unit is unique within a group of 
singular matrices. However, this definition is circular, in that S, the factor 
matrix of factor scores, necessarily is identical with Y; the unknown A 
remains unknown. 

This discussion has emphasized the connection between factor analysis 
and well-known procedures for separating data into two or more parts. 
Following Cochran and Cramér, the separation of data into orthogonal parts 
was formulated in terms of symmetric idempotent matrices as the matrices 
of the quadratic forms. It was then shown that from the geometric view 
of factor analysis the specification of one or more factors is the specification 
of one or more sets of direction cosines that generate a symmetric idempotent 
matrix and that this matrix, A, and its annihilator, (I — A), achieve a 
separation of the data. The nature of the matrix A was examined for two 
different approaches to factor analysis. For the first approach, Eckart and 
Young’s results were reviewed to show that a minimum trace of Z(I — A)Z’ 
is achieved by the principal-axis factoring of R, . For the communality 











28 PSYCHOMETRIKA 


approach, the merely formal character of A was emphasized. Although 
problems of estimation and statistical inference were not considered in this 
paper, this final result lends support to the belief that the communality 
principle poses important problems of statistical estimation. 


REFERENCES 


1. Aitken, A. C. On the independence of linear and quadratic forms in samples of ‘normally 
distributed variables. Proc. royal Soc. Edinburgh, 1939, 60, 40-46. 

2. Bartlett, M. S. Multivariate analysis. J. royal stat. Soc. Sup, 1947, 9, 176-90. 

3. Cochran, W. G. The distribution of quadratic forms in a normal system with applica- 
tions to the analysis of variance. Proc. Cambridge phil. Soc., 1934, 30, 178-91. 

4. Cramér, Harald. Mathematical methods of statistics. Princeton, N.J.: Princeton Univ. 
Press, 1946. 

5. Eckart, Carl, and Young, Gale. The approximation of one matrix by another of lower 
rank. Psychometrika, 1936, 1, 211-18. 

6. Harris, Chester W. The symmetrical idempotent matrix in factor analysis. J. exp. 
Educ., 1951, 19, 239-46. 

7. Holzinger, Karl J. Factoring test scores and implications for the method of averages. 
Psychometrika, 1944, 9, 155-67. 

8. Rao, C. R. Estimation and tests of significance in factor analysis. (mimeographed). 

9. Thomson, Godfrey. The factorial analysis of human ability. Boston: Houghton Mifflin 
Co., 1950. 4th edition. 


Manuscript received 1/25/54 


Revised manuscript received 4/9/54 




















PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


THE CHOICE OF AN ERROR TERM IN 
ANALYSIS OF VARIANCE DESIGNS* 


ARNOLD BINDER 


INDIANA UNIVERSITYT 


This article presents a survey of the assumptions which may be made 
in variance designs, a description of the mathematical models which reflect 
these assumptions, and a discussion of the ways in which various experimental 
conditions affect the choice of an error mean square. Particular emphasis is 
laid upon the principles, purposes, and dangers of pooling error mean squares 
in order to raise the power of a test. Specific recommendations are made for 
the rules of procedure for pooling (under various conditions) which produce 
tests with optimum power and error characteristics. 


Among the various treatments of psychological statistics one finds a 
good deal of confusion and discrepancy in the recommended procedures for 
selecting an error term in the analysis of variance (see 4, 5, 7, as examples). 
In all too many cases the obtained significance or insignificance of the ex- 
perimental results depends as much upon the particular statistics text used 
as upon the sampling data. The aim of this paper is to show the possible 
assumptions which may be made in regard to analysis of variance data, some 
of the hypotheses which may be tested, and how these and other factors 
influence the choice of the error term. Because of space limitations, the 
arguments will be restricted to a two-factor (or double classification) arrange- 
ment with m replications per cell. Many of the arguments presented here are 
directly translatable into the more complex designs. 

Unfortunately, the derivations of the proper terms for testing various 
hypotheses under the conditions specified by the assumptions require a good 
deal of mathematical sophistication for their understanding. While references 
will, in all cases, be made to the sources in which the proofs may be found, 
this paper is aimed principally at the reader who is less interested in rigorous 
mathematical analysis than in the uses of the material in research design. 
For the purpose of identifying the various potential groups of assumptions 
and demonstrating the proper statistics under these assumptions, we shall 
make use of three mathematical models: linear hypothesis model, components 
of variance model, and mixed model. 


_ *The writer is indebted to Professors Quinn McNemar and Lincoln Moses of Stanford 
University for reading the manuscript and offering many helpful suggestions and criticisms. 
He is grateful to Professor Z. W. Birnbaum of the University of Washington for preliminary 


suggestions as to form and notation. 
{The preliminary draft of this paper was completed while the author was at Stanford 


University and the Veterans Administration Hospital, Palo Alto. 
‘29 
























































(I — wo.) 
T=¥ t=! t=) — Uae B40, 
abe eer whe yr) 4 ae < (I ) I } .L 
Mtg — (I — w)at + (T — A(T — 4) (I — w)au S][99 UNF 
oft ylix, — — Mo eee ole oot oft ioe — + snjd 
COX — UH) LL RA MK EEX — IX — OKIE LK (I — (1 — 4) UOT}OBIO FU] 
ee es (I — w)o. 
ae ie oe ee oe (I — w)ou S]]2o UIT 
e- "Oa s 
i 
= ae (= 9 -"4) 
= : : = ee — I(T —# U01}0B.104U 
g (ox fbx atx xz Zu (I — 9)(T — 4) qoB194U] 
5 
mM 
° ne (I — 9) 
27 = ey: = suUIN[Od 
COX - aa bt Wa (I 9) 
4g = (I —_ 4) 
z I=! \ - J SMO’ 
Me 3 hy) <4 mo AL ) U 
erenbg uveyy ‘a jo'¢q 90In0g 














uBIsoq] SUOTPVII[dayY Ww ‘1OJOBY-OMT, 1OJ BUIBYDG doURLIEA 
T WTaV 


30 

















ARNOLD BINDER 31 


The data and definitions of Table 1 will be used throughout the paper. 





In the table, X;;, is the observation in the ith row (i = 1, --- , r) and the 
jth' column (j = 1, --: , c) and for the kth replication (k = 1, --- , m); and 
Dy Xiis 
X,;. = Observed cell mean = a 
YL Xin 
X;.. = Observed row mean = + 
cm 
de pw X six 
X.;. = Observed column mean = **4=+—— 
rm 
pp ep oe 
X... = Observed over-all mean = *— — , 


Before dealing with the differences among the various models, let us 
consider the meaning of row, column, and interaction “effects.’’ Each factor 
(or classification) is a characteristic or variable (such as individuals, con- 
ditions, tests, or treatments) which includes a number of different specific 
elements. In our case there are r elements in the data for the row factor and 
c elements in the data for the column factor. It is assumed in the analysis of 
variance that the value of each X,;,;, observation is derived from two con- 
tributing sources: one dependent upon the particular row and column elements 
to which the particular unit belongs, the other independent of these elements. 
The first of the two contributing sources includes the row, column, and 
interaction effects; the second includes errors of observation and an over-all 
value constant for all of the observations in the data. Thus, the row effect 
is the magnitude of the contribution of a particular row element to the 
observed values of all units which it encompasses; the column effect is the 
magnitude of the contribution of a particular column element to the observed 
values of all units which it encompasses; and the interaction effect is the 
magnitude of the contribution due to the coming together of a particular 
row element with a particular column element. 

The effects, the over-all value, and the errors of observation are assumed 
to be independent; their sum determines the various observed values. Since 
the mean of the errors of observation is assumed to be zero, and since the 
values within any one cell are assumed to vary only as a result of these 
measurement errors, the average value of X,;, over a great number of repli- 
cations within any one cell would be expected to be equal to the sum of its 
row, column, and interaction effects plus the over-all value. (In this paper 











32 PSYCHOMETRIKA 


the over-all value in the case of the components of variance model is set 
equal to zero.) 

For the most part, the above discussion applies only to the pure case in 
which replications are actually exact repetitions of each of the rc conditions. 
In many cases it is not feasible to have these exact replications within each 
cell because of effects of such factors as learning and motivation. But in 
using different (although comparable) units within the various cells, one 
introduces sampling errors in addition to the measurement errors. These 
sampling errors, however, can be made self-compensating, in the sense that 
they make an equal contribution (on the average) to all of the mean squares, 
by appropriate sampling and design. The examples in the following sections 
are illustrative of the ways in which sampling errors may be made self- 
compensating. If the sampling errors are handled in this way, the analysis 
is reducible to the general model established above. 

As an illustration, let us take an experiment in which the row factor 
consists of individuals (the row elements being the specific individuals) and 
the column factor consists of a series of tests of visual acuity (the column 
elements being the specific tests). Let us assume an observed value or score 
of 30 for the second replication in the cell intersected by the third row and 
the first column (i.e., X3,. = 30). This observation or score is made up of a 
number of components which sum to produce the specific value. We assume 
for the sake of exposition that the component contribution uniform for all 
observations in the third row is equal to 11 (i.e., the third row effect is equal 
to 11), that the component contribution of the visual acuity test in the first 
column for all of the observations which it encompasses is equal to 8 (i.e., 
the first column effect is equal to 8), that the component contribution of the 
unique interaction between the individual in the third row and the test in 
the first column for all of the observations in the (r = 3, c = 1) cell is equal 
to 3 [i.e., the (r = 3, c = 1) cell interaction effect is equal to 3], that the 
over-all value is equal to 6, and that the error involved in the second replica- 
tion in the (r = 3, c = 1) cell is equal to 2. Thus, the assumption of the 
analysis of variance implies that the individual in the third row taking the 
test in the first column for the second replication will obtain the specified 
observed value or score of 11 +8 +3+6-+2 = 30. 

The essential difference between the linear hypothesis model and the 
components of variance model is that the main effects of the former are 
fixed and constant whereas the main effects of the latter are random variables. 
All other differences result from the different mathematical treatments 
necessitated by this distinction. In order to have fixed and constant effects it 
is necessary that the elements of each factor be unique and not determined 
by random sampling; in order to have random effects the elements of the 
factors must be selected by simple random sampling from a larger population. 
Thus, if the entire population of elements is included in a particular factor, 











ARNOLD BINDER 33 


its effects are fixed; if a random sampling of elements from some larger 
population is included in a particular factor, its effects are random. Examples 
of these kinds of effects will be presented later. The mixed model has one 
factor with fixed effects and the other factor with random effects. 

When one makes a statistical test on the row effects he is testing an 
hypothesis of the type: Among the elements of the row factor (whether these 
be fixed or random) there is no variation in the magnitude of the contribution. 
to the obtained observations. In other words: All the row effects are equal. 
Similarly for the column and interaction effects. 


I. Linear Hypothesis Model 


For purposes of exposition it will be convenient to divide the linear 
hypothesis model into three cases: (a) no a priori assumption as to inter- 
action is made; (b) the a priori assumption is made that the interaction 
effects are equal, but no preliminary test of this assumption is desired; and 
(c) a preliminary test of the assumption of no interaction is made. 


Case (a): Linear hypothesis model; no interaction assumption. 


We assume 
Xie = Mize Foie Hwee Hat Gin 5 (1) 
where 
ui;. = fixed interaction effect, (¢ = 1, --- ,r),(@ = 1, +--+, 0c); 
u;.. = fixed row effect, (¢ = 1, --- , r); 
u.;. = fixed column effect, (7 = 1, --- , c); 


= over-all value (most commonly called the general mean). 


That is, we assume that a particular observation is determined by the sum of 
the over-all mean, the effect of the row of which it is a member, the effect 
of the column of which it is a member, the effect of the row by column inter- 
action for the cell of which it is a member, and an error term. 

We further assume that the ¢;;, are independent random variables, 
normally distributed with mean equal to zero and variance equal to o” 
(unknown). 

We also have the assumptions 


p Bij. = 0, } = 0, 2 tei = 0, DM. = 0. (2) 


t=1 t=1 i=1 j=1 


These latter restrictions do not involve any loss of generality; if the effects 
which sum to make up X;;, do not meet these assumptions, new values, 











34 PSYCHOMETRIKA 
which meet these restrictions as well as the additive assumption, may be 


derived linearly from the original ones. That is, suppose we assume only that 
Xie = Mize Foose Hae. Fat sje 5 


we may then derive 


1 Tr ce 1 Tr 1 ce 
wm eta to ee Fo Dene toy, (3) 
i=1 j=1 i=l a] 
Bi a on eS eS (4) 
i te Dy De tire — De Bie s 
, 1 r 1 Tr ce 1 ce 
Mi = Do Mize Fee — ra > Dwi — BOR (5) 
i=1 t=1 j=1 i=1 
‘ 1 e 1 Tr 1 r ce 
Rigs — Rige c 2 Bij. — r Do bis: + re > Lx Bis ’ (6) 


sv that the derived values satisfy all of the assumptions and may be called 
the “effects.” 

(Note that we make no assumptions as to the existence or vanishing of 
the interaction effects for this case; i.e., the u;;. may be equal to 0 or to 
K;,,;. , where K,;. ¥ 0, for all 2, 7.) 

An example of this model would be an experiment in which c types of 
psychotherapy (perhaps directive versus nondirective or interpretive versus 
suggestive versus reflective) are employed by all r psychotherapists of a 
particular clinic in an attempt to have various subjects recall certain repressed 
material. Under each of the rc conditions there are m subjects; each subject 
is used under only one set of conditions (making the total number of subjects 
rem). The recorded score in each case would be the time required for the 
particular therapist, with the particular therapeutic technique and with the 
particular subject, to get this subject to recall spontaneously certain (con- 
trolled) repressed material. Notice that we can generalize our results no 
further than to the therapists in this particular clinic since we considered 
the therapists not as a random sample from some larger population but 
rather as being fixed and distinct. The r therapists constitute our row factor 
population. (This distinction will be clearer after the discussion relative to 
the components of variance model is read.) If the experimenter had wanted 
to generalize his results to all psychotherapists in a particular area, while 
still utilizing the same c therapeutic techniques, he would have had to select 
a random sample of r therapists from the entire population of therapists in 
the area under consideration. It is acceptable to use all of the psychotherapists 
in one particular clinic in this latter case only if the assumption may be made 
that the therapists in the clinic represent a random sample of all the therapists 
in the area to which the results will be generalized. This case would con- 
stitute a mixed model. 











ARNOLD BINDER 35 


The likelihood-ratio test (which is closely related to maximum likelihood 
estimation) is widely used for testing hypotheses in statistics since this test 
has many optimum properties. To test the hypothesis that there is no differ- 
ence or variation among the row effects, or, what is equivalent, that there are 


no row effects (all u;.. = 0), the likelihood-ratio test leads to the ratio (see 
Table 1) 

2 

a (7) 


which is distributed as F with (r — 1) and rc(m — 1) degrees of freedom 
under the null hypothesis. [For proof see (8), pp. 59-60.] 


Case (b): Linear hypothesis model; assumption of no interaction, but without 
preliminary test. 


As in case (a), we assume 
Xie = Meg. + ose. + eye ba + ein, 


where e;;, are independent random variables, normally distributed with mean 
equal to zero and variance equal to o” (unknown). And also 


Do bis: = 0, DU mie = 0, > pis = 0, Do bei. — 0 
This time, however, we make the additional assumption that there are no 
effects due to interaction; that is, u;;. = 0 for all = 1, --- ,randj = 1, 
- , c. (Note that this assumption makes the two assumptions above in 
regard to the summing of the interaction effects over 7 and 7 redundant.) 
In this case, the likelihood-ratio theory leads to the following quotient 
for testing the hypothesis that there are no row effects (or all u;.. = 0): 
2 
Ss, 
2 (8) 


’ 
Situ 


which is also distributed as F, but with (r — 1) and (r — 1) (ec — 1) + 
rc(m — 1) degrees of freedom (if the null hypothesis is true). (For proof see 
6, pp. 220-224). 

Thus we see how the appropriate term (according to the theory of likeli- 
hood-ratio tests) to be used in testing the existence of the row effects (or 
column effects by similar reasoning) depends on the accepted assumptions. 
If an experimenter’s data fit the assumptions behind the linear hypothesis 
analysis of variance model, and he makes no assumption as to the existence 
or non-existence of interaction, he uses (7); but if the experimenter can 
assume on the basis of some a priori reasoning that no interaction exists 
with data which fit the assumptions behind the linear hypothesis model, he 
uses (8). 














36 PSYCHOMETRIKA 


Case (c): Linear hypothesis model; the use of a preliminary test. 


Now the question arises as to the advisability and acceptability of the 
procedure of testing the significance of the interaction term by means of 
2 


3 (9) 


~ 


#0 


(which will be referred to hereafter as the “‘preliminary test’’) before deter- 
mining whether to use (7) or (8) for the final test. Thus, if the interaction 
term is significant when tested by (9), the within cells mean square is used 
as the error term; if the interaction term is not significant when tested by 
(9), the sum of the interaction and within cells sums of squares divided by 
their combined degrees of freedom is used. 

This is a compromise procedure which was originally derived on an 
intuitive basis by applied scientists in an attempt to utilize their experimental 
results and past knowledge to raise the power of their statistical test. [The 
power of a test is defined as one minus the probability of a type II error. See 
(10, pp. 246-248) for a good discussion of type I errors, type II errors, and 
power in the analysis of variance.] With this procedure the two tests (pre- 
liminary and final F) are not statistically independent, since they are both 
made on the same set of data. Thus, certain dangers are introduced. 

This lack of independence and mathematical neatness has led mathe- 
matical statisticians to shy away from this area of application until very 
recently (and indeed some apparently still condemn this whole process of 
making the preliminary and final tests on the same set of data). Since 1944, 
Bancroft (2), Mosteller (11), Paull (12), and Bechhofer (3) have made 
important contributions to the problems involved in making preliminary 
tests of significance. The surface has as yet, however, barely been scratched. 

In accord with the current literature the three possible procedures will 
be referred to as “never pool” (involving no assumption as to the existence 
or non-existence of interaction with no preliminary test), ‘“‘always pool’’ 
(involving the assumption of no interaction with no preliminary test), and 
“sometimes pool’’ (where the error term in the final F-test depends upon the 
results obtained in a preliminary test of the significance of the interaction 
mean square). 

Before presenting the formal summarization of the rules of procedure, 
certain symbols for degrees of freedom will be defined to facilitate the dis- 
cussion. Accordingly, let 


m = (r —- 1), 


nz = (r = I)(c sig 1), 
nz; = rce(m — 1), 
nm, = (r — 1) — 1) + re(m — 1); 











ARNOLD BINDER 37 


and let F(a, ; n; ,n;) refer to the value which is exceeded by F with probability 
a, under the null hypothesis for the degrees of freedom n; (numerator x’) 
and n,; (denominator x’); i.e. 


Pr {F > Fa, 37; ,n;)} = a.. (10) 


(The subscript of a, that is, z, may be equal to 1, 2, or 3; the particular usage 
will be explained later.) 

For testing the row effects in the two-factor, m replications case the 
statistical procedure may be summarized as follows: 





“Never Pool” “Always Pool” 
Reject u,.. = 0, if Reject u;.. = 0, if 
8 8° 
<2 = Flas jm , Ms). = > Flas 5m , M). 
Accept u;.. = 0 otherwise. Accept y;.. = 0 otherwise. 


‘Sometimes Pool’’ 
Reject yu;.. = 0, if 


LJ 


2 





8; 8, 
g2 = Plas 5M , Ms) and gi = Maz jm ,m); = (11) 
or if 
3? s? 
52 < F(a 5 Me , Ma) and 7 — > Flas 3m ,m). 
Accept u;.. = 0 otherwise. 


Let us examine the advantages and disadvantages of each and the con- 
ditions under which each may be used. 

The “always pool’ procedure (where the interaction effects are in fact 
non-existent) provides a uniformly more powerful F-test than the ‘‘never 
pool” procedure for equivalent type I errors. [A uniformly most powerful 
test is one which is more powerful than all other possible tests (a test being 
defined by its critical region) regardless of the alternative to the null hypo- 
thesis which is assumed to be true.] If the “always pool” procedure is used 
and there actually are interaction effects, the denominator in the final F-test 
will tend to be too large, and the test will give too many non-significant 
results when in fact the null hypothesis is not true. Increase in interaction 
effects increases this distortion without limit, so that the research worker 
may be working at the, say, 1/500 per cent level of significance although he 
thinks he is working at the 5 per cent level. [See Table 2, p. 74 in Bechhofer 
(3) for an indication of how bad this disturbance gets under various con- 
ditions.] 








38 PSYCHOMETRIKA 


The “sometimes pool” procedure is an attempt to avoid errors of this 
sort; the preliminary test is expected to advise against pooling when the 
interaction is large. The “‘sometimes pool’ procedure cannot be expected to 
eliminate this source of error (or disturbance) entirely. But it is useful if it 
keeps the type I error of the final F-test close to the level at which the in- 
vestigator thinks he is working. For equivalent type I errors this procedure 
also makes the power of the final F-test greater than the power of the final 
F-test under the ‘‘never pool’’ test. 

It will be convenient for further exposition to introduce a term which 
summarizes the over-all magnitude of the interaction effects. Let this be 


ee (12) 
i=1 j=1 
A equals zero only when the interaction effects are all equal to zero; it gets 
proportionately larger as the u,;. deviate from zero. 

When \ is large, power and error characteristics make the use of the 
“sometimes pool” test theoretically unjustified (3). It is even more precarious 
to use the “always pool” test under these conditions. Thus, when an investi- 
gator has no a priori evidence to indicate a particular value for \, he uses 
the ‘‘never pool’ test, the routine use of the ‘‘sometimes pool” procedure 
being theoretically unacceptable. 

In those cases in which the experimenter has definite a priori reasons for 
the belief that \ is equal, or at least close, to zero (that is, ail u;;. approxi- 
mately equal to zero), and at the same time wants a certain amount of 
protection from an inaccurate assumption, the use of the ‘sometimes pool” 
procedure can be justified and is advantageous. 

But more is involved than the mere caution that the use of the ‘‘some- 
times pool” procedure requires definite a priori evidence indicating zero 
interaction. Since there is no preliminary test in the case of the “never pool” 
and “always pool” tests, their power is completely determined once the 
significance level of the final F-test is selected (for a given design, a specific 
value of A, and an assumed-as-true alternative to the main effects’ null 
hypothesis). When the “sometimes pool” procedure is used, however, the 
selection of a particular significance level for the final F-test merely limits 
the power of the whole test; it does not completely determine this power. 
The power (under the same conditions as above) is specified only when both 
the final and preliminary significance levels for the F-test are established. 

Every combination of preliminary and final significance levels (for 
fixed degrees of freedom) within the general category of the ‘‘sometimes pool’’ 
procedure yields a different test. The “always pool” and “never pool’’ tests 
may simply be thought of as special (or extreme) cases of the “‘sometimes 
pool” procedure, with preliminary significance level equal to zero and one, 
respectively. Accordingly, as the preliminary significance level of a “‘sometimes 























ARNOLD BINDER 39 


pool” procedure is decreased, it approaches an “always pool” test; as the 
preliminary significance level is increased, the ‘‘sometimes pool” procedure 
approaches a “never pool’ test. Although the power of the entire test is 
greater with smaller preliminary significance levels, these smaller levels 
provide less protection from the disturbance resulting from an error in judg- 
ment as to interaction (particularly at the intermediate values of \). Con- 
versely, although there is more protection from the potential disturbance 
in total test significance level with larger preliminary significance levels, 
there is less gain in power over the corresponding “‘never pool”’ test. 

[These relationships of power and total test type I error to the level of 
the preliminary test are not monotonic for all conditions. The best tests to 
be recommended in this paper have definite advantages in both power and 
error characteristics over many alternate tests. Nevertheless, the relation- 
ship indicated above does hold for wide and important (for protection 
purposes) ranges (a) in the magnitudes of the degrees of freedom, (b) in the 
selected final significance level, (c) in the value of \, and (d) in the possible 
alternative to the main effects’ null hypothesis. ] 

Bechhofer (3), for this model, and Paull (12), for the components of 
variance model, have worked out compromise tests which involve minimum 
danger of erroneous conclusions over the widest possible (for a uniform 
procedure) ranges in the values of interaction, the various degrees of freedom, 
and the possible alternatives to the main effects’ null hypothesis. The pro- 
cedure involves the free selection of the significance level of the final F-test; 
the preliminary significance level is established (by appropriate rules) so 
as to provide a test with the most desirable characteristics for the fixed final 
significance level. 

What follows in this section is directed toward elaborating the statistical 
rules for establishing, for a fixed final significance level, that preliminary 
significance level for F which leads to the specific ‘sometimes pool” test 
with the most favorable power characteristics and minimum disturbance in 
significance levels for the linear hypothesis model. 

Following Bechhofer (3), let 








ies k =e D D lr ; M2 , Ns) (13) 

b= [59 devas im sm) (14) 
™ (r — 1) : 

= Lene awa |e sm +0. _ 


For fixed degrees of freedom a is completely determined by a, , b by a, , and 
c by ag. a, is the level of significance to be used for the preliminary test. 
a, is the level of significance for the final F-test which the experimenter 











40 PSYCHOMETRIKA 


would use if the preliminary test advises against pooling. a; is the level of 
significance for the final F-test, which the experimenter would use if the 
preliminary test recommends pooling. Notice that a, defines the “never 
pool” procedure when a, = 1 and that a, defines the “always pool’’ pro- 
cedure when a, = 0. The above conditions define a “sometimes pool”’ test 
whenever 0 < a, < 1, which is the situation that interests us at the moment. 

Within the category of ‘‘sometimes pool’’ tests we make three distinctions 
(3, p. 26): 





b 
class A tests when e> ca 1° 
borderline tests when c= . (16) 
a+1’ 
class B tests when ex ~ = 
“ a+l 


The particular “sometimes pool” test is thus automatically determined once 
the significance levels (a, , a. , and a;) are selected. 

The foregoing exhaust all of the possible relationships which may exist 
between c and b/(a + 1). These values are under the control of the experi- 
menter in the sense that he is free to choose the a-levels of significance; the 
latter uniquely (for fixed degrees of freedom) determine a, b, and c. The usual 
procedure employed by investigators is to choose the same level of significance 
for both the preliminary and the final F-tests; in this way the test is specified 
uniquely (and without their knowledge) for them. Let us take an example. 
Suppose an investigator chooses the 1 per cent level of significance for the 
preliminary F-test and also for the final F-test, regardless of the outcome of 
the preliminary test. Suppose also, in this example, that r = 4, c = 5, and 
m = 3. The F-value at the one per cent level for n, = (r — 1) (ec — 1) = 12, 
and n; = rc(m — 1) = 40 is 2.66; forn, = (r — 1) = 3,andn; = rc(m — 1) = 
40, this level is 4.31; for n, = (r — 1) = 3, andm = (r — 1) (Cc — 1) + 
rc(m — 1) = 82, this level is 4.18. This gives 














_f @@ | . 

a= | (52) (2.66) = .798, 
_[_@) | _ 

b= ROO) (4.31) = .323, 
x (3) | i 

= 1@@ + @@@1*® = 241. 














ARNOLD BINDER 41 
Since 


.798 + 1 4), 


this procedure amounted to a class A test. But, as Bechhofer (3, see par- 
ticularly Tables 2, 3, 4, 5A, 5B) has shown, class A tests do not have the 
most favorable combination of power and error characteristics for this model. 
Thus, without definite knowledge or awareness, the experimenter selected 
a generally inferior test by employing this rather widely used procedure. 

After a very thorough evaluation of the power and error characteristics 
of the various types of “‘sometimes pool’’ tests, Bechhofer concluded that the 
borderline test was the over-all best bet in terms of relative assurance of 
freedom from erroneous experimental conclusions. The borderline test does, 
however, introduce a slight distortion in the whole test type I error when 
= 0. That is, if a, is the type I error of the “never pool” test which the 
experimenter ordinarily would use, the borderline test defined by 6 (a2) and 
c(a.) has the property that its type I error will be larger than a, , when 
A}= 0. 

To illustrate how much larger this type I error gets let us consider one of 
Bechhofer’s examples (3, p. 81). For ag = a; = 0.05 and for n, = 2, n, = 2, 
and n,; = 6, the maximum type I error of the borderline test (that is, when 
\ = 0) is 0.0653. This distortion is just about the worst that can be en- 
countered in the two-factor, m replications design since the type I error 
decreases with either increasing \ or with increasing n; (regardless of \) and 
approaches the limiting value of a.(= as) very rapidly. 

Thus, as Bechhofer concludes, “There is strong justification for the use 
of the borderline test under the circumstances specified. By tolerating a 
small increase in size [type I error] the experimenter can achieve a relatively 
large gain in power --- [when A = QO]. He is protected against large --- (A) 

- since the power never will drop below the power of the conventional 
‘never pool’ test he ordinarily would use.” (3, p. 112). The absolute gain in 
power of the borderline test over the “never pool” test is a function of the 
degrees of freedom, and is greatest for smaller values of n; . 

In addition to the advantage in gain of power of the borderline test, we 
saw in the preceding paragraph that its use brings freedom from any of the 
gross disturbances in type I error discussed in previous sections of this paper. 

The recommended procedure, then, where the experimenter has strong 
a priori reasons for believing that the interaction effect is zero (or very close 
to zero), and where he wants to have some protection from the catastrophic 
effects of a completely erroneous assumption, is as follows: 


.241 > — (that is,c > - 


1. Establish the level of significance (a) for the final F-test which would ordinarily 
be used for a “never pool” test, and set a2 = a3 = a. 











42 PSYCHOMETRIKA 


2. Determine b from 





[= Tre ager’ 8 
b= ES ze, 1) F(a, > , 3), where Qe a. 


3. Determine c from 





“ e—D) pi ie 
— EF = 1c we 1) A. re(m aoe 5 Ft 3 Ny >); where Aa,—>a. 


4. Determine a from 


i ma 


ee (The borderline test.) 
5. Since a is determined, the F-value for the preliminary test, which gives the most 
effective “sometimes pool” test, may be found by 


‘ < re(m — 1) 
Fla, > Nz » Ns) a F i, 1)(c rw 5 |@ 


6. Now that the three F-values are established, we can proceed with the rule of 
procedure defined previously for the ‘‘sometimes pool”’ test, with*the understanding that 
the type I error will be slightly larger than anticipated. 





In the example given above, thus, instead of using the one per cent level 
of significance for both the preliminary and final F-tests one would proceed 
as follows (again r = 4,c = 5, m = 3): 


1. The one per cent level of significance will be used for the final test regardless of 
outcome of the preliminary test. 








3 
2. b= | ao | 4.31) = 323 
Ho 43” 
Where 4.31 is the F-value at the 1 per cent level of significance for n. = 3 and n; = 40. 
3. es | _ |e. = 241 
(3)(4) + (4)(5)(2) 
Where 4.18 is the F-value at the 1% level of significance for nm: = 3 and ny = 52. 
| 323 
4, a 1 = 340 
7 40 
5. F(a, ; 12, 40) = Ip §:340) = 1.133 
6. Now we reject n;.. = Oif 
oI 8, 
— > 1.133 and = > 431 
es 8. 
or if 
8? rd 
=? < 1,133 and one > 4.18; 


and accept u;i.. = 0 otherwise. 








ARNOLD BINDER 43 


IT. Components of Variance Model 
Again X,;, is the observation in the 7th row (¢ = 1, --- , r), the jth 
column (j = 1, --- , c), and for the kth replication (k = 1, --- , m). In this 
model the row effects, the column effects, and the interaction effects are all 
assumed to be random variables. And 


X sik = U; + V; + Wi; + Zi 
(= 1, ---,7), (gj =1,--- ,¢Q, (kK = 1, +--+, ™m). (17) 


(Notice that there is no symbol representing the over-all value above, 
since it is equal to zero for this model.) 


U,; = the 7th row effect, 

V,; = the jth column effect, 
W,,; = the ith row by jth column interaction effect, 
Z;;, = an error term. 


[Roman letters represent the effects in this model; the Greek letter yu (in 
various forms) represents the effects in the linear hypothesis model. This 
notation is in accord with the practice of mathematical statisticians to use 
Greek letters for parameters (population values) and Roman letters for 
random variables. In the components of variance model the variances involved 
are the parameters.] 

We further assume all U; , V; , W;; , and Z,;, are independent and 
normally distributed, with the following population values: 


U; é, o. 
V; é oy 
W:; ty Ow (18) 
Mii é, o; 


E=&+E +& +6) 


If we convert these to the values 





R, =U, -&, (19) 
S,;=V; —-&, (20) 
Ti, = Wis —&y, (21) 
Qi = Zin — & , (22) 











44 PSYCHOMETRIKA 


the new terms are independent and normally distributed with the population 
values 


Mean Variance 

R; 0 o; = 0, 

S; 0 o, = 0, 
Ti; 0 o; = oe (23) 

Qi ix 0 oy = 0, 
Xin =EtR, +8; + Tis + Qin (24) 
o=atotatoa=atoatoateo:. (25) 


As stated previously, the components of variance model differs from the 
linear hypothesis model in that the row and column effects are considered to 
be random variables, not fixed constants. That is, the individuals, tests, or 
situations which constitute the two factors in the components of variance 
model are randomly selected from two larger groups. 

An example of this would be the following study to test the hypothesis 
that among all of the schools in a particular county there is a real difference 
in the average reading ability (within each school) of the third graders. For 
the study, r schools are selected at random from all of the schools in the 
county, c books are selected at random from all of the third-grade readers 
used in the county, and cm third-grade students are randomly chosen from 
each of the r schools. The students are each asked to read aloud 500 words 
from one of the readers; their average reading speeds are the recorded values. 
(The passages from the books are, incidentally, also chosen at random). 
There are, thus, m observations in each cell representing the reading scores 
of m different children from the school in the 7th row, each reading the book 
in the jth column. It is assumed that the schools have what may be called 
“contribution to reading speed” values for third graders of U, , U,,--- , U,, 
which constitute a sample of r from a normal distribution of such values 
among all of the schools. It is also assumed that the c books have “ease of 
reading” contribution magnitudes of V, , V., --: , V. ; the latter contri- 
butions constitute a sample of c from a normal distribution, drawn inde- 
pendently of the first values. If r other schools had been chosen, the values 
U,, U,, ++: , U, would have been different, and if c other third-grade readers 
had been chosen the V, , V, , --- , V. would have been different. [We are 
not interested in any “basic reading capacity” which the children may have 
independently of their particular school training in this design since the 
random sampling and placement of these children enables this capacity to 
make an equal contribution (on the average) to all of the sums of squares.] 











ARNOLD BINDER 45 


Thus, the results of this study may be generalized to the population as a 
whole (all the schools in the particular county in the case of the row effects— 
the test of the above hypothesis—and the entire population of third-grade 
readers used in the county in the case of the column effects). 

Notice that we referred to the U; , V; , and W;, as the row, column, and 
interaction effects, respectively. As a result of this, the testing of the row 
effects (for example) for significance in this model amounts to a test of an 
hypothesis of the sort: In the population as a whole, from which the sample 
was drawn, there is no difference in the row effects among the individual 
elements. In the above example this means we test the hypothesis that there 
is no difference in (or variation among) the “contribution to reading speed’ 
values of the various schools (the row effects) and/or that there is no differ- 
ence in (or variation among) the “ease of reading”’ contributing values of the 
books (the column effects) in the two populations. Thus, with this model the 
null hypothesis that the row effects do not vary in the population takes the 
form o% (or 07) = 0. («2 = of = 0 in the case of the columns). 

If we had called the R; , S; , and 7;; the row, column, and interaction 
effects, respectively, the type of hypothesis we test and the preliminary 
assumption would be analogous to that of the linear hypothesis model. 
First, we would test an hypothesis such as that there are no row effects, rather 
than that there is no row effect difference or variation. This follows from the 
fact that the R; (which have means equal to zero) are all identically zero and 
thus non-existent when o; = 0, since their distribution is concentrated at 
the point zero. Also, our preliminary working equation for this model, showing 
the additive composition of the observed values, would have included a term 
representing an over-all value as in the linear hypothesis case. This term 
would be é , and, as uw previously, it would represent an over-all or general 
mean. Referring to the U; , V; , and W,; as the effects is consistent with 
common practice and emphasizes the distinction between the effects of the 
linear hypothesis model and those of the components of variance model, as 
well as the treatment differences necessitated by this distinction. 

As was the case with the linear hypothesis model, the choice of the 
proper term for testing the null hypothesis (in this case «; = 0) depends on 
the assumptions made relative to the interaction effects. If the investigator 
makes no assumptions as to the equivalence of the interaction effects he 
tests the hypothesis «7 = 0 by 


Se, || $e, 
*. bolt bo 


which is distributed as F only when o; = 0, the numerator being too large 
otherwise. (For a delineation of the proof see 10, pp. 345-346.) This is the 
“never pool’ procedure. Note that the interaction mean square is the proper 
error term for this model under these conditions; the within cells mean 














46 PSYCHOMETRIKA 


square used as the error term in the linear hypothesis model is not the correct 
error term here. 

If the investigator has ample reason to make the assumption that the 
interaction effects are identical (that is, c = 0), he may use the “always 
pool” procedure and test the hypothesis «7 = 0 by 

a 


2 
Si+w 


[For the essential features of the proof see (10), pp. 345-346.] This is the 
same test as used in the “always pool” procedure for the linear hypothesis 
model. As in the case of the other model, too, this procedure provides a 
uniformly more powerful test when the assumption is true. 

Again there is motivation for the use of a preliminary test of significance 
by reason of doubt as to the validity of the assumption concerning the inter- 
action effects. Contrary to the linear hypothesis model, the use of the “always 
pool’ procedure, when there is in fact an interaction effect variation, results 
in the final F-test denominator tending to be too small, with the test giving 
too many significant results when the null hypothesis is true. With increase 
in interaction effect variations, this disturbance increases without limit as 
before, so that the experimenter may think he is operating at the 5 per cent 
level of significance while he may actually be operating at the, say, 37 per cent 
level. 

The same preliminary test is used as for the preceding model, namely, 





& wal 0 


8 


The rule of procedure for this model may be summarized as follows: 





“Never pool” ‘Always pool” 
Reject o; = 0, if Reject o; = 0, if 
8, 8° 
ge = Plas jm , m). 3, 2 F@s im ,m). 
i i+w 
Accept a; = 0 otherwise. Accept a7 = 0 otherwise. 


“Sometimes Pool’’ 
Reject o, = 0, if 





2 2 
8; 8, 
82 > F(a, » Ne , Ns) and 3° > F(a, Ny » 2); 
or if 
s* 3? 
“ < Flay 52 , Ms) and - > Fas 3m , %). 
w i+w 


Accept a; = 0 otherwise. 

















ARNOLD BINDER 47 


Paull has found that a class A test has the most desirable properties 
for this mathematical model. This class A test, which is described below, 
is recommended ‘as one which tends to stabilize the disturbances at inter- 
mediate values of [the ratio of the expected value of the interaction mean 
square to the expected value of the within cells mean square] while still 
taking advantage of a considerable portion of the possible gain in power at 
values of [this ratio] near one” (12, p. 544). When this ratio, which is analagous 
to the A of the linear hypothesis model, is large, there is little disturbance 
with the “sometimes pool” tests, since pooling is almost never advised. 
Paull recommends this procedure as the best compromise between the lower 
preliminary F significance levels (5 per cent, etc.), which do little to counter- 
act the possible disturbance in the type I error of the final F-test, and the 
higher preliminary F significance levels (70 per cent, etc.) which provide 
little increase in power over the “never pool’’ procedure. 

As a matter of fact, the effective or total test significance level when 
a = a3; = 0.05 and the preliminary F-test is made at the same level (a, = 
0.05) is considerably above 10 per cent for wide ranges in the value of the 
ratio of the expected value of the interaction mean square to the expected 
value of the within cells mean square. 

The recommended class A procedure consists of pooling the interaction 
and within cells mean squares when their ratio is less than 2F(.50; nz , m3); 
that is, accept o7 = 0, if 


? 
oe < 2F(.50; ns , Ms), 


and then carry on with the final F-test using s/s; if 7 = 0 is not accepted, 
and s;/si,. if o; = 0 is accepted. [Fifty per cent points for the F-distribution 
may be found in the tables compiled by Merrington and Thompson (9).] 

Let us illustrate this procedure with fictitious data like that used to 
show the linear hypothesis ‘sometimes pool’’ procedure (one per cent level 
of significance for the final test with r = 4, c = 5, and m = 3). The F-value 
for the 50 per cent level of significance with n. = (r — 1) (ec — 1) = 12 and 
n; = rc(m — 1) = 40 is equal to .961; the F-value for the one per cent level 
of significance with n, = (r — 1) = 3 and n, = (r — 1) (ec — 1) = 12 is 
5.95; and the F-value for the one per cent level of significance with n, = 3 
and n, = 52 is 4.18. 

We will reject «7 = 0 if 


> 1.922 and 3 


IV 


5.95; 


ae) 
Ss wie 














48 PSYCHOMETRIKA 


The constant multiplier 2 is arbitrary and a smaller value may be used 
where the experimenter is willing to sacrifice some power in the final F -test 
for increased assurance against extreme disturbance in significance level. A 
simpler rule which, according to Paull, may be used when the degrees of 
freedom of both rows and columns are greater than 6 is to pool the interaction 
and within cells mean squares when their ratio is less than 2. This is approxi- 
mately equivalent to the above procedure, for large degrees of freedom, and 
does not necessitate reference to the F-table. 


ITT. Mixed Model 


Let us examine the case where the experimental data fit neither the 
assumptions of the linear hypothesis model nor those of the components of 
variance model exclusively, but fit the assumptions of a combination of the 
two. This is commonly called a mixed model. 

We assume for the mixed model that the effects of one of the factors 
(say the columns) have been obtained by the random selection of elements 
representing that factor, while assuming that each element of the other 
factor (say the rows) has a constant effect which is typical for that element 
(i.e., no random sampling but rather fixed effects). 

We assume that each observation is composed as follows: 


X siz =E+tuit S; + Ti; + Qi ix , (27) 


again (t= 1,---,r),(j = 1,---,c), (k = 1, --- , m) and where &, S; , T;; » 
and Q,,;, are derived in the same way as in the components of variance model 
above. The yu; are constants such that 


rs ij. (28) 


S,; , 7; , Q:;, are normal, independent, random variables with the population 
values 


Mean Variance 
S; 0 a, 
Ti; 0 a: (29) 
Qi ix 0 oe 
As in the case of the linear hypothesis model, we may wish to test the 
hypothesis of no row effects (u; = 0, for7 = 1, --- , 7); or, as in the case of 


the components of variance model, we may wish to test the hypothesis that 
the column effects are identical (¢; = 0). An example of this model was 
presented incidentally above as a variation of the linear hypothesis example. 
There are two schools of thought in the literature as to the proper error 




















ARNOLD BINDER 49 


term for testing the hypothesis of no random effect variation (o7 = 0 in “he 
case presented above) when o; ¥ 0. On the one hand are those represented by 
Anderson and Bancroft (1, p. 340) who assume fixed interaction effects for all 
of the observations encompassed by a given random main effect. That is, they 
assume that the entire population of interaction effects for each of the random 
elements is included in the sample since the entire population of elements 
intersected by the random elements is so included. In the notation of this 
paper this assumption implies (in the case where the column effects are 
random and the row effects are fixed) 


} T 3; = 0. 
i=1 


Those represented by Mood (10, p. 348), on the other hand, assume that the 
interaction effects of the observations encompassed by both the random 
effects (columns above) and fixed effects (rows above) may be treated as 
random sampling variables exactly as in the case of the components of 
variance model. 

The model advocated by Mood leads to an expected value of the random 
(main) effects mean square which includes the term mo; , while the model 
advocated by Anderson and Bancroft leads to a random (main) effects mean 
square whose expected value does not include mo; . Thus, the expected value 
of the random column effects mean square above is o; + mro; + mo; under 
the Mood assumption, and o; + mro; under the Anderson and Bancroft 
assumption. As a consequence of this difference, the proper error term for 
testing the hypothesis of identical column elements (when o; ~* 0) is the 
within cells mean square in the case of the Anderson and Bancroft model 
and the interaction mean square in the case of the Mood model. 

If the position of Mood is accepted (and this implies that the interaction 
effect resulting from the coming together of a specific random element and a 
specific fixed element shows random error variation) the rule of procedure 
for this model for testing both the hypothesis of no row effects (u; = 0) and 
the hypothesis of no column effect variation (¢; = 0) is identical to the rule 
for the components of variance model. The pooling procedure recommended 
by Paull (12) is applicable to the mixed model under Mood’s assumption 
when the main effects to be tested include the random variable. 

The rule of procedure if the position of Anderson and Bancroft is accepted 
(which is consistent with the general scheme presented early in this paper 
and usually more defensible) differs from that of the components of variance 
model in only one respect. The error term for testing the hypothesis of no 
column effect variation, when o; = 0 is rejected, is s2 instead of s;. The rule 
of procedure for testing the hypothesis of no row effect is identical to the 
components of variance model (except, of course, that an hypothesis of the 
sort 4; = 0 is accepted or rejected in the case of the mixed model). 











50 PSYCHOMETRIKA 


Where one wishes to test the fixed main effects with the Mood model or 
where one wishes to test either of the main effects with the Anderson and 
Bancroft model, no specific recommendations for a preliminary significance 
level can be made at this time. The most satisfactory pooling procedure in 
terms of minimum disturbance or deviation in the significance level at which 
the experimenter thinks he is working and maximum power has, as yet, not 
been worked out under these conditions. All that has been said for the previous 
models regarding (a) the motivations for using each of the tests, and (b) 
the dangers and necessary cautions in using the “always pool” and ‘‘some- 
times pool’ procedures, applies equally to this model. It is as true for this 
model as it is for the other two that an investigator should not use either of 
these latter two procedures unless he has strong reason to believe that there 
are no interaction effects or interaction-effect differences, as the case may be. 


REFERENCES 


1. Anderson, R. L. and Bancroft, T. A. Statistical theory in research. New York: McGraw- 
Hill, 1952. 

2. Bancroft, T. A. On biases in estimation due to the use of preliminary tests of significance. 
Ann. math. Stat., 1944, 15, 190-204. 

3. Bechhofer, R. E. The effect of preliminary tests of significance on the size and power 
of certain tests of univariate linear hypotheses. Unpublished doctor’s dissertation, 
Columbia Univ., 1951. 

4. Edwards, A. L. Experimental design in psychological research. New York: Rinehart, 
1950. 

5. Guilford, J. P. Fundamental statistics in psychology and education. New York: 
McGraw-Hill, 1950. 

6. Johnson, P. O. Statistical methods in research. New York: Prentice Hall, 1949. 

7. McNemar, Q. Psychological statistics. New York: Wiley, 1949. 

8. Mann, H. B. Analysis and design of experiments. New York: Dover Publications, 1949. 

9. Merrington, M. and Thompson, C. M. Tables of percentage points of the inverted 
beta (F) distribution. Biometrika, 1943, 33, 73-88. 

10. Mood, A. M. Introduction to the theory of statistics. New York: McGraw-Hill, 1950. 

11. Mosteller, F. On pooling data. Jour. Am. stat. Assn., 1948, 43, 231-242. 

12. Paull, A. E. On a preliminary test for pooling mean squares in the analysis of variance. 
Ann. math. Stat., 1950, 21, 539-556. 


Manuscript received 1/29/54 


Revised manuscript received 4/27/54 











PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


A RATIONAL CURVE RELATING LENGTH OF REST PERIOD 
AND LENGTH OF SUBSEQUENT WORK PERIOD 
FOR AN ERGOGRAPHIC EXPERIMENT* 


LepyarRD R Tucker 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


A rational function is developed relating the length of a rest period and 
length of subsequent work period in an ergographic situation. Simple energistic 
postulates are used for a critical organ or neuromuscular structure whose 
failure to perform adequately results in a stoppage of the work period. Experi- 
mental results for two subjects using a finger ergograph indicate that the func- 
tion yields the general trend of the data but that there seem to be some 
systematic deviations of the data from the present rational function. One 
parameter determined from the data represents rate of recovery from moder- 
ate fatigue. It is hoped that this development will aid in studies of motor 
oo as related to such other variables as age, motivation, and effects 
of drugs. 


The idea for the present rational development occurred during a perusal 
of general literature on work decrement. A number of psychologists have used 
the ergograph in a variety of studies ranging from those concerned with per- 
sonality characteristics to those dealing with work in industry. While consid- 
erable progress has been made by physiologists on characteristics of active 
muscles and nerves, there seems to have been only moderate success in appli- 
cation of these physiological developments to the problems encountered by 
psychologists in dealing with behavior of integrated, intact individuals. 
Indeed, there are a number of instances where psychologists claim that be- 
havior such as exhibited with the ergograph cannot be accounted for on purely 
physiological and energistic grounds. The difficulty may be in finding how the 
various physiological details can be incorporated into descriptions of behavior 
of the complete individual. A second possibility is that psychologists have not. 
considered sufficiently simple and limited behavioral situations to observe the 
physiological and energistic determiners of behavior. In the present case a few 
simple energy relations are postulated which only approximate the relations: 
that might be determined on physiological grounds. These simple relations, 
however, permit development of a functional relation observable in the per- 
formance of an individual in a limited ergographic experiment. Psychologists 

*This research was jointly supported in part by Princeton University and the Office 
of Naval Research under contract N6onr 270-20. 

51 











52 PSYCHOMETRIKA 


may find the present development of use in studying more complex situations. 

After an individual has performed a constant, repetitive, motor task to 
such an extent that he no longer can continue, a rest period will result in the 
individual’s being able to perform the task again for some work period before 
again being unable to continue. A graph relating length of resi period and 


Subject 1 


Asymptote 








ho 


Length of Work, W, in Seconds 














30 c = .010 
B = -.76 

20 

10 

ré) 

0 20 Lo 60 8 100 120 140 160 180 200 220 2ho0 
Length of Rest, R, in Seconds 
Asymptote 
™ o 
Subject 2 
© 
z 
5 60 
17) 
& ) 
a 
= Cc = .008 
¥ = B= -.95 
= 
S 50 
F} 
% 
& 20 ° 
x 
1e) 
10 1O 
° 








© 2 w 6 80 100 120 140 160 180 200 220 20 
Length of Rest, R, in Seconds 


FIGURE 1 
Rest-Work Curves for Two Subjects 


length of subsequent work period will be of the form of those shown in Figure 1. 
A similar result was obtained empirically by Manzer (2). Short rest periods 
will be followed by short work periods, longer rest periods will be followed by 
longer work periods. As the rest periods are lengthened, the subsequent work 











LEDYARD R TUCKER 53 


periods should also lengthen, but to a progressively lesser extent until some 
maximum length of work period is approached. 

In developing a rational function, several assumptions are made con- 
cerning energy relations within the organ or neuromuscular structure whose 
failure to function adequately is responsible for the work stoppage. The organ 
might be the working muscle, or it might be one of the nervous elements 
responsible for excitation of the muscle. We will consider at this time only the 
critical organ or structure whose failure to function adequately results in 
failure of the individual to perform the task. It is assumed that fatigue of other 
organs or structures will have little effect on length of the work period so long 
as these organs do function. This is probably an oversimplification of the 
situation. Interaction between organs or structures probably does occur such 
that fatigue in one results in greater expenditure of energy by others in order 
for the individual to continue the task. This interaction is being ignored in the 
present development. 

Consider an organ that is using energy at some constant rate during a 
working period. The supply of energy immediately available to the organ to be 
used in performing the task is being depleted. If this energy is being replen- 
ished at a slower rate than that at which it is being used, the supply of energy 
will be reduced. When the energy level falls to some critical point, the indi- 
vidual will be unable to continue the task and the work period will end. 

During a rest period the energy supply of the organ will be replenished to 
an extent dependent on the length of the rest period and the rate at which 
energy is being made available to the organ by the rest of the individual’s 
body. (For the present development, the nature of the physiological mechan- 
ism involved is not of immediate relevance.) At the end of such a rest period, 
the immediately available energy supply of the organ will again support per- 
formance of the task during a subsequent work period. 

Consider the following postulates and definitions. Let: 


E, = energy immediately available to organ at time ¢; (1) 

E, = energy immediately available to organ when it is in a completely rested 
state; (2) 

a = rate of expenditure of energy during work period (postulated to be a 
constant); (3) 
C(En — E.) = postulated rate at which body replaces energy to the organ; (4) 
W = length of work period; and (5) 
R = length of rest period. (6) 


It is to be noted that postulate (3) forms a limit on the type of situation to 
which the present development is appropriate. The task should not be one for 
which the individual may work faster when more rested and then slow down 
when he becomes fatigued, nor should the task vary with the fatigue of the 
individual. The common type of ergographic series, where there may be long 
initial strokes followed by short strokes as the individual tires, is inappropriate 











54 PSYCHOMETRIKA 


for the present development. In an ergograph situation the strokes should be 
of constant length and made at constant timing. Inability to make a stroke of 
standard length is to be interpreted as failure to perform the task. Thus, the 
individual is not driven to such fatigue that he cannot make a stroke of any 
length; he just cannot make one of standard length. Even in this case, this 
assumption of a constant rate of expenditure of energy is probably an approxi- 
mation. 

Postulate (4) involves the simple concept that energy replacement occurs 
at a rate proportional to the extent of deficiency below a maximum amount of 
energy available. This maximum amount of energy available is that which 
would be present in a completely rested organ. C is the constant of propor- 
tionality. (Z,, — E,) is the extent of energy deficiency. This postulate is prob- 
ably a gross approximation to a true function which could be determined from 
physiological considerations, but it should be usable for cruder developments 
and for cases involving a limited task, such as the flexion of a finger. This 
postulate would probably be inappropriate for more extensive tasks involving 
a large proportion of the body. 


Energy Available, Ey 











Time t 


FIGURE 2 
Relation Between Energy Available and Time During a Series of 
Work and Rest Periods 


Consider the curve of energy available versus time in Figure 2. At time 
t, the energy level is E, . As an initial work period progresses the energy level 
decreases along the curve until it reaches the level FE, . This energy level F, is 
the critical value between continuation in performance of the task and dis- 
continuation of performance. Whenever the energy level is below F, there is 
insufficient energy available for the organ to continue its part in the perform- 
ance of the task. Whenever the energy level is above EF, there is sufficient 




















LEDYARD R TUCKER 55 


energy for the organ to continue its activity. The time interval for this initial 
work period is W, . 

A rest period of duration R, is now imposed and the energy level builds 
up to EZ, . During the following work period the energy level reduces to FE; , a 
critical level between continued and non-continued performance of the task. 
Since the task has not been altered, we might postulate that 


E, = E, . (7) 


The duration of this work period is W, . 

Consider that a second rest period of duration R, is imposed, which is fol- 
lowed by a work period of duration W, . The terminal energy level EF is again 
the critical level between continued and non-continued performance of the 
task, and, therefore, is equal to HE; and EF, . Let 


Tis = Hy . (8) 


These two rest periods started with the identical energy levels FE; and EF, as 
postulated in (7); thus, if the energy restoration conditions are identical as 
postulated in (4), the energy levels at the end of these rest periods should be 
identical, i.e., 

EE. —_ E, . (9) 


It might be expected, then, that the two subsequent work periods would be 
identical also, i.e., 
W. — W, ° (10) 


This logic would lead to an expectation that a long sequence of rest-work 
periods with equal rest periods would have equal work periods. Yochelson (3) 
has reported data indicating such constancy in work periods following definite 
rest periods in long sequences of rest and work periods. Data gathered in the 
experimental try-out of this development also tended to support this con- 
tention. A finger ergograph working against spring tension with a fixed excur- 
sion to a block was used. The rate of contractions was set at one contraction 
per second. Preliminary trials revealed considerable initial practice effect, 
session to session, for a subject. During practice sessions some long sequences 
of rest and work periods with equal rest periods were tried out. The following 
results for the sixth practice session for one subject using a series of 60-second 
rest periods are typical. The lengths of the work periods, in seconds, were 54, 
36, 30, 30, 31, 30, 28, 28, 28, 28, 33, 31, 30, 34, 28, 28, 31, 39, 28, 28. The first 
one of 54 seconds in this series should not be counted. It corresponds to the 
initial work period W, before any of the fixed rest periods and might be ex- 
pected to be long. The remaining work periods seem to vary within a fairly 
constant band with no apparent progressive decrement. Presumably this will 
hold only for a finite time and the experiment should not involve excessive 
sessions. 











56 PSYCHOMETRIKA 


During the rest period R, the rate of change of energy with time can be 
obtained from postulate (4). Only energy replacement is considered to be 
active during the rest period. 


aE. C@. - BE). (11) 
dt 

Integration yields 
E,, = E, as ror (12) 


where f is a constant of integration. When the terminal times ¢, and ¢, and the 
corresponding energy levels HE, and E, are substituted into (12), one obtains 


E.. — E, ef Cet 

E,, —— eit) (13) 
=e ~Cty+f+Cte-f) (14) 
= el (tant) (15) 


It is to be noted that the length of the rest period is 
R = ty me t ’ (16) 


since ¢, and ¢, are the end and beginning times. The subscript to R is being 
dropped for convenience. Then (15) can be written 


=e. (17) 


or, solving for £, , 
E, = E,, — (E, — E,)e. (18) 


Consider the subsequent work period. Energy is being used at a constant 
rate as per postulate (3) as well as being replenished as per postulate (4). Thus, 


dE, 


= —a+C&, — E). 
di a+ C(E,, ) (19) 


Integration yields 
; , | (nea 
EK, —H,- ena". (20) 


where g is a constant of integration. Substitution of limiting times /, and t, and 
energy levels £, and /, and writing a ratio yields 











LEDYARD R TUCKER 57 


1 





=e —— (21) 
E,— E; - ct 
ae ef (tents) (22) 
= °°", (23) 
where 
W=t-h. (24) 
Substituting /, for E; as per (7) and solving for EF, yields 
Ey = EB, — 5a - (x, — dae”. (25) 


In relating the work period and rest period, the two expressions for EZ, in 
(18) and (25) are equated to yield 


2 «8. «i? «di, Aa a (x. a i alee, (26) 


Subtracting /, from both sides of the equation, 
5, — £, ~ @. - E,)e“* 
1 we 
=H,—E, - C an- (x, — EF, - i a) ani (27 


Or, 
(E, — BE, — e “*) = (w, -E,- - a) — e"), (28) 
a= BY (1 —e°*) = (1 — e°”). (29) 
(x. boats FE, ot C a) 
Define 
, — E)) 
B= - SG. — 8 1 : (30) 
(x, E, - C a) 

Then 

Bal — e “*) = (1 — e®”). (31) 


Or, solving for e°", 


OO ts + Be +a (32) 











58 PSYCHOMETRIKA 


Thus e°” is linearly related to e~“” with a slope of B and intercept of 1 — B. 
It is interesting to note that when the numerator and denominator of the 
right side of (30) are multiplied by C, 
CE —. #;) 
—a+ CG. — BE) _ 





B= 


Thus, from (11) and (19) 


we: (for the rest period) 
B=) ———______.. (34) 
dk t 


(for the work period) 
dt 

Another point of interest is that the relation given in (31) or (32) does not 
involve directly the amount of energy expended. Only measures of duration of 
rest and work periods need be determined. It is not necessary to observe the 
energy expended as is frequently attempted in ergographic experiments by 
computing the work performed by the muscle. (In case the muscle is not the 
critical organ responsible for the work stoppage, the work performed by the 
muscle would not be equal to the energy expenditure to be considered in case 
it were necessary to determine the constant a.) This fortunate feature is due to 
the restriction to a situation for which there is a constant rate of energy ex- 
penditure for the critical organ. 

Three experimental sets of data were obtained. All used the finger ergo- 
graph previously described, which involved a spring load rather than weights. 
The excursion of the finger tip was limited by a block. The rate of finger con- 
tractions was set at one per second in all three cases. On each contraction, the 
limit block was to be touched. Failure to make a complete stroke ended each 
work period. In each experiment one subject was used for a number of sessions. 
Each session was composed of a “warm-up” period involving three work 
periods separated by 60-second rest periods. The first experimental rest period 
followed immediately the last warm-up work period. In the experimental 
session proper, a sequence of rest, work, rest, work, etc. periods were used. 
Instead of having a sequence of equal rest periods and thus determining one 
point on the rest-work curve at each session, each of the selected rest periods 
was used once at each session and the subsequent length of work period was 
determined. The order of rest periods was varied between sessions. Mean 
length of work periods following each length of rest period was determined for 
each subject. 

Data for the initial subject are not presented here because he showed 
considerable practice effect from session to session. The subject performed 
about twice as much work in the fourth session as in the first session. The 
other two subjects received more extensive practice sessions and showed less 


increase in work performed during the experimental sessions. Results for the 











LEDYARD R TUCKER 59 


preliminary subject were analyzed and the curve of (32) fits these data about 
as well as it does the data for the two subjects reported here. 
Mean lengths of work periods following the chosen rest periods are given 


in Table 1. 


TABLE 1 
Experimental Results 
(All times are in seconds.) 








Subject 1 Subject 2 
— of =. of = — 
> 5.8 5 10.0 
10 9.9 10 14.7 
20 1h.9 20 21.5 
ho 23.5 Lo 31.8 
60 27.8 60 39.7 
90 37.5 80 45.2 
120 41.5 100 51.0 
160 48.1 120 54.0 
200 50.0 180 67.0 
2h0 52.9 2h0 77.8 





* Mean length of work periods over 8 sessions, 


** Mean length of work periods over 6 sessions, 


The values of B and C for each subject were determined graphically. 
In cases where more precise determinations of these constants are desired, 
some more precise statistical method of curve fitting might be used. In the 
present case we were interested in obtaining only the proper order of magni- 
tudes of B and C and felt that there was an advantage in the graphical method 





te 


= ,0OT7 




















FIGURE 8 
Rectification of Rest-Work Curve for Subject | 











60 PSYCHOMETRIKA 


in surveying the properties of the function and the data. A series of trial values 
of C were assumed. For each value of C, values of e°” and e °” were obtained. 

Figure 3 shows graphs for subject 1 for three values of C. Each point is 
determined by one rest period and the subsequent work period. From (32) it 
is expected that the points between e“” and e °* would be linearly related 
for the proper value of C. Analysis of (32) also indicates that this line should 
pass through the point (1, 1). All three lines drawn in Figure 3 pass through 
this point. It is to be noted in Figure 3 that a low value of C yields a negative 
curvature and a high value of C yields a positive curvature. A C of .010 seemed 
to yield the best approximation to a straight line. A best-fitting line was 
drawn by eye with a slope of —.76, thus determining B. The line drawn on 
Figure 1 for subject 1 is the line for (32) with the values of B and C determined 
above. 


2.2 























0.0 0.2 0.4 0.6 0.8 1.0 
-CR -CR oor 
FIGURE 4 
Rectification of Rest-Work Curve for Subject 2 


Figure 4 shows graphs of e°” and e ““ for subject 2 and three trial values 
for C. While deviations from a straight line do not seem extreme, it is of inter- 
est that the points cannot be brought into more or less random fluctuations 
around a straight line by any choice of C. There seems to be a systematic 
wave about the straight line in each graph. This type of curve may result from 
the inadequacies of our approximations. A slight suggestion of this effect may 
be detected also for subject 1 in Figure 3. These results seem to be related to 
the results reported by Féré (1) and by Manzer (2), where performances after 
moderate rest periods were superior to performances after longer rest periods. 
While the consistency and seriousness of this lack of fit of the present function 
is a matter for further study, it is the author’s judgment from the present data 
that (32) yields the general sweep of the observations. Some different set of 
postulates might yield a better fit to the data, or the systematic deviations 
may be considered as perturbations to be accounted for by further complexities 
in the mathematical model. 











LEDYARD R TUCKER 61 


Returning to the fitting of (32) to the results for subject 2, values of .008 
for C and —.95 for B seemed to give the best fit to the data. The corresponding 
curve is drawn in Figure 1. 

An asymptote is indicated for each curve in Figure 1. This asymptote 
may be determined by setting R equal to infinity in (32.) Then e “" is zero 
and e°” equals (1 — B). In the experiment, the first work period exceeded 
this asymptote by some 10 to 20 per cent. This is a second indication of an 
inadequacy of our formulation which might be corrected by a more complex 
set of postulates. Another possibility is to interpret the present function to 
apply to the body state following the warm-up period in the experimental 
sessions. 

Future work with the rational rest-work function can take any of several 
lines aside from development of a more adequate (and probably more complex) 
function. Individual differences in values of C and B for a fixed experiment 
might be correlated with other variables such as age. Effects of such conditions 
as ventilation, use of drugs, motivation, and response modality on C and B 
could be investigated. The experiment could be expanded to include systematic 
variation in load on the ergograph and timing of flexions, thus investigating 
other characteristics of the present function when a (rate of energy expendi- 
ture) and LE, (critical energy level) are varied. It would be hoped that use of 
the rational function for the rest-work curve would help in obtaining greater 
precision in results for these various types of investigations. 


REFERENCES 
. Féré, Charles. Travail et plaisir. Paris: Felix Alcan, 1904. 
2. Manzer, C. W. An experimental investigation of rest pauses. Arch. Psychol., 1927, 90, 
1-84. 
3. Yochelson, 8. Effects of rest-pauses on work decrement. Unpublished Ph.D. dissertation, 
Yale University, 1930. 


_ 


Manuscript received 2/2/54 


Revised manuscript received 4/27/54 











PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


A MEASURE OF INTERRELATIONSHIP FOR OVERLAPPING 
GROUPS* 


Ben J. WINER 


PURDUE UNIVERSITY 


A coefficient of interrelationship between overlapping groups that 
avoids indeterminacies inherent in the construction of fourfold tables for 
such purposes and, at the same time, is relatively insensitive to the absolute 
magnitude of marginal totals of fourfold tables, is developed. Under assump- 
tions that are consistent with the objectives of organizational analysis, this 
coefficient is shown to be equivalent to a product-moment correlation co- 
efficient. The advantages at limitations of this coefficient are pointed out. 
A numerical example giving computational procedures is presented. 


Suppose an organization G of individuals is made up of k overlapping 
groups 9; , Jo, °°: , gx . No restriction is placed upon the number of groups 
to which an individual may belong. Problems arise in analyzing the structure 
of organizations in which the equivalent of a correlation coefficient is needed. 
For example, one may seek means for simplifying the group structure of a 
complex organization. If one could construct the equivalent of an inter- 
correlation matrix, the factorial structure of this matrix might suggest 
means for restructuring the groups within the organizationin such a way 
as to preserve many of the optimal conditions that may be present in the 
more complex structure and, at the same time, suggest ways of reducing the 
number of groups necessary to accomplish the same general mission. 

As a starting point for this analysis, suppose one had the matrix of 
observations X, whose elements n;; represent the number of individuals in 
G who belong to both g; and g; , i.e., 


[mas Mig *°° Nik 
No, Neq *** Noe 








LN M2 ° °° Mer] 


(Joint occurrence matrices of higher order will not be considered in the 
present development. Third-order matrices would have as elements n;;» , 
*This measure was developed in connection with a study made by Dorothy C. Adkins 


(1). Her influence in the MN myc was felt in many ways. The article was prepared while 
the author. was employed at The University of North Carolina. 


68 








64 PSYCHOMETRIKA 


i.e., the number of individuals who are members of g; , g; , and g,, simul- 
taneously.) 

The number of individuals in each of the groups, n,;; , may show con- 
siderable variation. Up to a certain point, for purposes of analyzing organi- 
zational structure, the relative magnitudes of the n;,; are not particularly 
important. The coefficient of correlation sought, therefore, is one that is 
relatively insensitive to the magnitudes of the n,; . 

One of the simplest approaches to the problem would be to define the 
correlation between g; and g; by 


Ni5 
ie V3; V0; @) 
Although this definition of a correlation coefficient is equivalent to a product- 
moment correlation coefficient under rather general conditions, it has the 
disadvantage of being quite sensitive to the relative magnitudes of n,;; and 
n;; . This disadvantage rules against its adoption. 

An alternative approach might be to set up a fourfold table in order to 
compute phi coefficients, tetrachorics, or other types of coefficients of the 
same ilk. There are many possible fourfolds that can be constructed, de- 
pending upon what total of the marginal entries is considered appropriate. 
Adkins (1) humorously discusses the difficulties that arise when one starts 
to construct this type of fourfold table; in a sense, the minus-minus cell of 
this type of fourfold is indeterminate. One possible fourfold table might be 











9; 
- + 
Ny; Ni; Nj; Nis 
gi r 
: ae N-12— 1, +2; 24; — 25 |N — 1, 
N — 1; N35 N 














where N is the total number of individuals in G. If N is very much larger 
than n;; + ,; , most of the frequency would be concentrated in a single cell 
of the fourfold (the minus-minus cell). Phi coefficients computed from this 
type of fourfold are in general quite small in magnitude, have upper limits 
that are functions of the magnitude of the ratio n,;/n;; , and are quite sensitive 
to the magnitudes of n,;; and n,; relative to N. For purposes of analyzing 
underlying group structure, a phi coefficient obtained from this fourfold 
table would be of highly limited value. A tetrachoric coefficient computed 
from this type of fourfold would not have so many limitations as does the 
phi coefficient; such coefficients are, however, also sensitive to small changes 
in n,; and n;; . In general, one is quickly led to the conclusion that a fourfold 














B. J. WINER 65 


table does not provide a satisfactory starting point for an index of over- 
lapping group structure. [For a more detailed discussion of problems en- 
countered in using fourfold tables in related areas see (2, 3, 4).] 

An index which, in a real sense, is equivalent to a product-moment 
correlation coefficient can be derived from the observation matrix X by a 
series of assumptions that are consistent with the objectives of the subsequent 
organizational analysis. As a first step in this derivation, one sets up a matrix 
P having as elements 


Pa = N;;/Nii « 
If D, is a diagonal matrix having as diagonal elements n,;; , the matrix P is 
given by 
P = D;'X. (3) 


In general, P will not be a symmetric matrix. The elements of P represent 
the proportion of the members of g; that also belong to g; . 

As a second step in the derivation, let D, be a diagonal matrix having 
as diagonal elements the lengths of the row vectors of P, i.e., an element 
of D, is given by V >>; pi; . In order to normalize the rows of P, premultiply 
P by D;" to obtain - 

i 


a2 
F = D;’P = D;'D;'X = +t (4) 








ay_| 


where a, is a normalized row vector. The rows of F can be thought of as unit 
vectors in a k-dimensional vector space. Assuming that the basis vectors 
for this space are unit orthogonal vectors (justification of this assumption 
will be given presently), the elements of the matrix 


R = FF’ (5) 
represent the cosines of the angles between pairs of vectors. The correlatiom 
between g,; and g; can be defined by 

Ti; = cos (a; , a;) = a;-aj, (6) 


i.e., the scalar product of two row vectors having unit length. 
In the language of factor analysis 


Qi, GAyg *** Ay 


Ga. G22 *** Age 


(7) 








LQe. Giz *** Ape 











PSYCHOMETRIKA 


























€zS*LT0z 620°E6Q «= TOZ*LTS §=— QAS*TI §=©— GOR*6Sz_ OT" 6LZ 
COTCLOT oot] {00° TeuosetpA 
Lou *tog*t 00S*L6L 000°f09 00S *L6L 
Tn*eNo*T gleftoy  005*L92 QOOfTTE  O0S*26E  00S*L92 
t6%06g «6 TSE“NGE §=—-“ THESQ2zZ =: 000 “SET ooofl9z  000*StE = 000*SL 000*S6T 
6gtNzs § stofzez = LENET §=— gak“MTT = 00S *L9 000*9mz 00546. 00S‘z9 = 00°08 = “49 
Cor‘eos § = ose*ene «= SSE“E «= oz*EzT sty *2L 006442 0006 00S*9LT  005*Ng 000*LL 000*2zt 006*LL 
9 $ q € 2 t 9 s q € 2 t 
1X X JO SqyuomeTY Jo er0statg Jo xpr7eH XX 
("9 10uS) 9 TIEVL : (an9 yous) S$ FIavL 
6 ot so° s0° s0° to" 9 QLTO°T OO°L OT’ S0* 0° 0° To° 9 
zee Oe SOO” S 20sz°"t S2* Oot get 90° 90° 6t° S 
6t? «695° Ll 60° 60° 60° 1 a he aT oo a oo 1 
7 a 3 a Sa Sa € €9tz°t $z° 2t° 2t° OT Sze $e € 
"an ee) Se ee) 2 SLe9°T OS* $2° 2° 0S* OO°T S2° 2 
gor. Ct oe? gt’ Oak t SLN6*T OT? SL° $2* OS* 2° Oot T 
4A ae eS: - oe E 3 eS. 6 ee Ae oe ue ae a 





Steeg TeuoZoyz10 ue jo suey ut 
32 Jo uoTyequesaidey Butato 
810990) MOY FFU JO XPAqQEA = A XTI3eH 


€ Tlave 





f3 ut oste ary ouy 19 ut 
STENPTATPUL Jo uotztodowg = q xXPIqEH 


@ TiavL 

















EE 
Oo°t ¢€€° 62° o€° Lh° Lt 9 
So eo S 
62° sg° OO°T €€° she gs° q 
cee 692° (fe? «(COT Ue Cf € 
wom os ou oT es 2 
iw ue) «gs seg®gS* «SC OOTCiéT 
9 $ q € z T 3 
id d =U XPT 
q z1avL 

000 002 OOT OOT COT oz 9 

002 008 COC 0S 0S OST 5 

Oot Oo€ OOF 0S 0S OS 1 

Oot 0S 0S OOF COT OT € 

OT 0S 0S Ot MK OS 2 

02 OSt OS oor 0S We t 

9 $ q € 2 Tt 3 





t3 Uy OSTe ary OU 


2 uy STenpyAypul Jo tequmN = X xXPLzeK 


T Tigvs 











B. J. WINER 67 


can be interpreted as a matrix of factor loadings. The rows of this matrix 
can be considered to represent the projections of the groups on a set of 
orthogonal reference vectors. If g; had no overlap with any of the other 
groups, the ith column of F would have unity in the 7th row and zeroes 
elsewhere. Hence the orthogonal reference vectors, or factors, represent the 
location of ideal, non-overlapping groups in the k-space. This interpretation 
of the factors can be considered a possible justification for the choice of an 
orthogonal set of reference vectors rather than an oblique set. The effective 
dimension of the common factor space may be less than k; indeed, it is the 
objective of studies in this area to find this reduced dimension. 

Still another approach to the derivation of this proposed index of over- 
lapping group structure suggests possible limitations implicit in it. Consider 
the ith row of F as representing the profile of the joining behavior of the 
average individual in g; . Then the index of relationship derived here can be 
interpreted as a measure of profile similarity between two average individuals. 
(The deviations of each of the profiles are essentially measured from a common 
base line in terms of a metric relatively insensitive to n;; .) For those groups 
which are homogeneous with respect to belonging behavior (i.e., high intra- 
class correlation), this average individual closely approximates all individuals 
in the group. Where the groups are not particularly homogeneous with 
respect to belonging behavior, this average profile (and the index of relation- 
ship based upon it) has somewhat limited value. 

As a purely descriptive coefficient, the question of sampling distribution 
does not arise. The exact sampling distribution of the coefficient proposed 
raises a difficult problem in multivariate analysis. If the number of groups is 
large and the number of individuals within each group is also large, in spite 
of the fact that the coefficient can assume only positive values, it appears 
reasonable to assume that the sampling distribution of the multiple correla- 
tion coefficient provides an approximate sampling distribution for the index 
proposed here. 


Numerical Example 


An interesting application of the proposed index is given in (1). A 
smaller numerical application is presented here. Suppose an organization G 
consists of 5000 members. Further, suppose each member of G may belong 
to any, all, or none of six groups g; (¢ = 1, --- , 6). Let the number of indi- 
viduals belonging to both g; and g; be given by the element in the 7th row 
and jth column of the matrix shown in Table 1. The elements in the main 
diagonal of this matrix represent the number of individuals in each of the 
groups. 

The matrix P (Table 2) is obtained from the matrix X by dividing each 
row of X by the corresponding entry in the main diagonal of X. The rows of 
P can be regarded as vectors representing g; ; the squares of the lengths of 








68 PSYCHOMETRIKA 


these vectors, d?; , are obtained by squaring and summing the entries in each 
row of P. 

Normalized (unit) row vectors (Table 3) are obtained from P by multiply- 
ing the rows of P by 1//d?, . The sums of the squares of the entries in each 
row of F should total unity (within rounding error). The matrix F can be con- 
sidered as a matrix of factor loadings. The entries in row 7 of F represent 
the cosines of the angles between g; and a set of hypothetically independent 
groups represented by an orthogonal set of basis vectors. From F one generates 
a correlation matrix in the same manner in which a correlation matrix is 
obtained from a set of orthogonal factor loadings, i.e., by post-multiplying 
F by its transpose. 

A short-cut procedure is to compute the matrix XX’. If e,;; is the typical 
element in that matrix, then an element of the matrix R is given by 

C53 


iy = = 


Ve Ves 


In Table 5 the matrix XX’ is computed; in Table 6 the typical element is 
Vex; Ve,;; . The matrix R is obtained by dividing each element of Table 5 
by the corresponding element in Table 6. 


REFERENCES 


1. Adkins, D. C. The simple structure of the American Psychological Association. Amer. 
Psychologist, 1954, 9, 175-180. 

2. Carroll, J. B. The effect of difficulty and chance success upon correlations between items 
or between tests. Psychometrika, 1945, 10, 1-19. 

3. Wherry, R. J. and Gaylord, R. H. Factor pattern of test items and tests as a function 
of the correlation coefficient: content, difficulty, and constant error factors. Psychometrika, 
1944, 9, 237-244. 

4. Wherry, R. J. and Winer, B. J. A method for factoring large numbers of test items. 
Psychometrika, 1953, 18, 161-179. 


Manuscript received 5/26/54 


Revised manuscript received 7/5/54 























PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


AN EXTENSION OF ANDERSON’S SOLUTION 
FOR THE LATENT STRUCTURE EQUATIONS 


W. A. Gipson* 


CENTER FOR ADVANCED STUDY IN THE BEHAVIORAL SCIENCES 


Anderson’s solution for the latent structure equations is summarized 
and then extended in two ways so as to involve all items simultaneously. 


Some time ago Lazarsfeld and Dudman (4) achieved a solution, by means 
of determinantal equations, for Lazarsfeld’s latent structure equations. 
Recently their solution was extended by Anderson (1) in such a way as to 
involve matrix manipulations only. Both of these solutions have the ad- 
vantage over that of Green (2) of avoiding the need for estimating unknown 
elements in the manifest matrices—the elements with recurring subscripts. 
They have the disadvantage, however, of using much less of the empirical 
data than does Green’s solution. The purpose of this note is to indicate two 
ways in which Anderson’s solution can be extended so as to involve more of 
the empirical data and thus compare more favorably with Green’s solution 
in that regard. 

The latent structure equations have been developed elsewhere (3) and 
will merely be restated here in matrix form: 


R= L’'VL, (1) 
and 
R, = L'VD,L, (2) 


where F is the sample joint proportions matrix bordered by the manifest 
marginals, R, is the sample triple proportions matrix for item k, bordered 
by the joint proportions involving item k, L’ contains the latent marginals 
for all items and has its top row filled with 1’s, V is diagonal and contains the 
relative class sizes in its diagonal cells, and D, is diagonal and contains the 
entries from row k of L’ in its diagonal cells. All diagonal cells but the first 
in R and R, and all cells in row and column k of M, are empty and would 
have to be estimated if those cells were directly involved in the solution. 
The order of R and R, is n + 1, n being the number of items involved, and 
the rank of all matrices in (1) and (2) is m, the number of latent classes 
needed to account for the manifest data. 


Siesal *This article was written while the author was employed at The University of North 
arolina. 


69 








70 PSYCHOMETRIKA 


Lazarsfeld (3, p. 389) has defined a basic determinant of R as a deter- 
minant formed from the rows and columns of R in such a way as to include 
the first diagonal element in R but no other diagonal element. Thus no basic 
determinant in R contains unknown elements. A basic determinant of R, 
would be analogously defined and would contain no unknown elements pro- 
vided row and column k of R, were not involved. It is here convenient to 
speak of the basic sub-matrices of R and R, as being the matrices of the basic 
determinants. For the present purpose the basic sub-matrices will always be 
dealt with in pairs—one from R and the corresponding one from R, . Con- 
sequently the further restriction will be imposed that neither row nor column 
k of either R or R, may be involved in a pair of basic sub-matrices. Finally, 
we shall be concerned only with basic sub-matrices of order and rank m. 

Let P and P, represent such a pair of basic sub-matrices. Then, by virtue 
of (1) and (2), 


P = LVL, , (3) 


and 
P, = LiVD,L, , (4) 


where L; is a square matrix made up of the first row of L’ and of m — 1 other 
rows from L’, L, is a square matrix made up of the first column of LZ and of 
m — 1 other columns from L. From the restrictions that have just been 
stated, it follows that no item is represented both in L{ and L, , and that item k 
is represented neither in Lj nor in L, . Item k is, however, represented in D, . 
Because of its role in the formation of R, , Lazarsfeld has referred to item k 
as the stratifier (cf. 3, pp. 391-392). 
Anderson’s solution is simply to form the matrix, 


A=P"'P, = L;'V'L{"LiVD,L, = L;'D,L, , (5) 


and then obtain the characteristic roots and the right-sided characteristic 
vectors of A to get D, and L;'K, where K is an arbitrary diagonal matrix 
and remains, for the moment, unknown. Post-multiplying (5) through by 
L;'K shows that L;'K gives the right-sided characteristic vectors of A and 
that D, contains the latent roots of A. Thus, 


AL;'K = L;'D,L.L;'K = L;'D,K = L;'KD, . (6) 
Post-multiplying (3) by L;'K gives 
PL;'K = Li VL.L;'K nad LiVK. (7) 


Thus Li becomes available except for multipliers on its columns. These 
multipliers turn out to be simply the entries in the first row of L{VK, since 
the first row of L{ must contain only 1’s. L{ is thus obtained from the relation- 


ship, 

















W. A. GIBSON 71 


Li = (L{VK)(VK)" . (8) 
Given L{ , the matrix product VL, is obtained from (3) as follows: 
VL, = L{'P. (9) 


Both V and L, can now be obtained because the first column in VL, 
contains the diagonal elements of V. 

In this form the solution by Anderson involves only 2m — 1 of the 
items—the m — 1 items represented in L, , the m — 1 items in L, , and the 
stratifier k. There are two ways in which Anderson’s solution can be extended 
to involve all of the items. No unknown elements will be introduced into the 
manifest matrices that are used. One way is to use a composite stratifier 
consisting of some combination of any of the items that are not represented 
in L, or L, . The other way is to augment the basic sub-matrices (hence also 
Li) by additional rows representing all items not involved in L, or in the 
stratifier. 

The composite stratifier will be considered first. Let the subscript kl-- 
stand for a combination of any of the items that are not involved in Z, or L, . 
Then the sum of the corresponding triple proportions matrices is given by 


Ru. =R +R, +- = L'VDL+ L’VDL + - 
= L’V(D+ D, +L = L'VD,..L. (10) 


By analogy with (10) the latent structure equation for a basic sub-matrix 
in R,y-. is 


Pw ETB de (11) 


P,,.. can be used in Anderson’s solution in exactly the same way as is P, . 
In the special case where the subscript k/- refers to all of the items not 
involved in L{ and L, , it turns out that there is only one possible P,,.. . 

A pre-publication reviewer has suggested that any weighted sum, and 
not just the simple sum, of R, and D, matrices could represent a composite 
stratifier. 

Now consider the second way in which Anderson’s solution can be 
extended. Let P and P,,.. be augmented by additional rows representing all 
of the items that are not represented in L, or in the (single or composite) 
stratifier. Thus P, P,,.. , and L{ cease to be square, but (3) and (11) still 
hold, and all other matrices in those equations remain square. 

Now form the matrix 


B = (P'P)'P'P,,.. = (LAVL,LiVL,) ‘LAVL,LiVDy..Ls 
L°V""(@,10 VBS VV LLL Dy.L 
= Ec" Dis..tn . (12) 


ll 








Fb 4 PSYCHOMETRIKA 


The last step of (12) is identical with that of (5), except that the stratifier 
may here be composite. Thus the solution is the same as for Anderson from 
this point on, except that (9) is replaced by 


VL, = (L,Li)"'L,P (13) 


because L{ is no longer square. 

It is perhaps worth mentioning that this extension of Anderson’s solution 
can be shown to have two least-squares properties. The first is that the 
matrix B in (12) is such as to minimize the sum of squared discrepancies 
between the matrices P,,;.. and PB. The second is that the matrix VZ, in 
(13) is such as to minimize the sum of squared discrepancies between the 
matrices P and LiVL, . 

A few remarks may be in order as to which items should be involved in 
each of the three matrices L; , L. , and D,;.. . Perhaps the best way to proceed 
is to locate that basic sub-matrix in R which seems from inspection to have 
the most clear-cut rank m. Attention might next be given to the make-up of 
the stratifier. The principal requirement here is dictated by the role of D,,-. 
in (12). D,;.. contains the characteristic roots of B, and if the right-sided 
characteristic vectors of B are to be unique, all diagonal elements in D,,_- 
must be distinct and non-zero. At times it may be better to use only one 
item as a stratifier in order to insure this distinctness. At other times a 
composite may serve better than any single item in giving fairly even spread 
to the characteristic roots of B. In any event, some trial and error may be 
involved in the formation of an acceptable stratifier, for it will not always be 
possible to predict the necessary latent marginals with sufficient accuracy 
for this purpose. The solution by Green has this same problem (2, p. 158). 
Finally, all items not involved in the chosen basic sub-matrix of F nor in the 
trial stratifier can be thrown into the extra rows of P and P,,.. . 

After an appropriate P and P,,.. have been formed according to the 
requirements mentioned in the previous paragraph, the computing steps are 
as follows: (1) compute P’P and get its inverse; (2) form the matrix (P’P)~*P’ 
P,,.. and obtain its characteristic roots (D,,..) and right-sided characteristic 
vectors (L;'K); (3) compute the matrix product PL;'K and divide each of 
its columns through by the first entry in that column to obtain L{ ; (4) compute 
L,L} and get its inverse; (5) form the matrix (L,L{)~* L,P and divide each of 
its rows through by the first entry in that row to obtain L, ; (6) form the 
diagonal matrix V from the divisors of step (5). 


REFERENCES 


1. Anderson, T. W. On estimation of parameters in latent structure analysis. Psychometrika, 
1954, 19, 1-10. 

2. Green, Bert F., Jr. A general solution for the latent class model of latent structure 
analysis. Psychometrika, 1951, 16, 151-166. 























W. A. GIBSON 73 


3. Lazarsfeld, Paul F. The logical and mathematical foundation of latent structure analysis. 
Chapter 10 in Stouffer, 8. A., et al. Measurement and prediction. Princeton: Princeton 
University Press, 1950. 

4, Lazarsfeld, Paul F., and Dudman, Jack. Paper No. 5 and Introduction to Paper No. 6 
in Part II of Lazarsfeld, Paul F., et al. The use of mathematical models in the measure- 
ment of attitudes, Rand research memorandum no. 455, Santa Monica, The Rand 


Corporation, 1951 (mimeographed). 


Manuscript received 5/4/54 


Revised manuscript received 6/2/54 




















PSYCHOMETRIKA—VOL. 20, NO. 1 
MARCH, 1955 


A FACTOR ANALYSIS OF 
MENTAL ABILITIES AND PERSONALITY TRAITS* 


J. C. DENTON 


PROCTER & GAMBLE 
AND 


Catvin W. Taytort 


UNIVERSITY OF UTAH 


The relationship between measures of verbal fluency and certain person- 
ality traits is examined by factor techniques. From a matrix of eight factor 
scores derived from mental tests plus five personality scores, six factors were 
obtained. An oblique solution lends limited support to the hypothesized 
relationship between the two domains. 


In factorial studies of abilities, it has become general practice to include 
two or three “anchor” tests to measure each of the primary mental abilities 
that might be related to the experimental variables. In this way, one or 
more new factors may be isolated and interpreted with each successive 
well-planned study. The “anchor” tests which will probably best measure 
each of the several ‘‘established”’ factors can be identified fairly well. 

This is not yet the case, however, for the area of temperament and 
personality, where there is much less agreement upon “anchor” variables. 
Using several different approaches, Cattell (1) has rather consistently found 
ten to twelve factors. Guilford has developed three inventories, STDCR, 
GAMIN, and I, which represent end products of his efforts to measure 
temperament factors. Although no serious effort has been made to compare 
the works of these authors, it seems that some of their factors may be quite 
similar while others apparently do not appear in both sets. 

Few studies have straddled mental abilities and personality traits, even 
though the nature of any relationships found would be of considerable 
theoretical and practical importance. Thornton (9) found practically no 
overlap between tests of mental abilities and four questionnaire-type variables 
which measured a single factor called “Feeling of Adequacy.” Other studies 


*This study was supported by a grant from the Research Foundation of the Uni- 
versity of Utah. 
tCurrently on leave of absence with the National Research Council. 


16 








76 PSYCHOMETRIKA 


likewise find little relationship (2). There is supporting evidence, however, 
for the hypothesis that fluent persons tend to be independent, extraverted, 
and unstable (7). 

The present study is an effort to help define the relationships between 
mental abilities and personality traits, the latter being measured by a question- 
naire. The major hypothesis was that there would be some relationship 
between measures of verbal fluency and extraversion or rhathymia. Studies 
by Cattell and other British investigators (6) would tend to support this 
hypothesis. The relationship among the mental ability scores was also of 
interest since this involved, in a sense, a second-order factor study of eight 
cognitive factors. The personality score intercorrelations, which are ad- 
mittedly distorted by experimental-dependence conditions in scoring (3), 
were of minor interest. 


The Variables 


Data on twenty-eight mental ability tests and on a personality inventory 
were collected by Taylor, the mental ability tests furnishing the basis for 
his study of fluency (8). For the present study, the fifteen tests were selected 
which best measured Taylor’s eight primary abilities. Scores on two tests 
measuring the same factor were combined with equal weights to obtain a 
single index. (In the case of the Perceptual Speed factor, only one test was 
used.) These eight factor indices were included with five scores from a 
personality inventory for this study. 


The eight factor indices, for which the tests are described by Taylor, 
were as follows: 


1. Memory (First Names, Word-Number) 

. Perceptual Speed (Identical Numbers) 

. Reasoning (Letter Series, Letter Grouping) 

. Number (Addition, Multiplication) 

. Verbal Comprehension (Same or Opposite, Completion) 
. Word Fluency (First and Last Letters, Suffixes) 

. Verbal Versatility (Similes, Letter Star) 

8. Ideational Fluency (Topics, Theme) 


bo 


ND or hP 


The remaining five personality variables from Guilford’s “Inventory of 
the Factors STDCR” were: 


9. Social Introversion 
10. Thinking Introversion 
11. Depression 
12. Cycloid Tendency 
13. Rhathymia 











J.C. DENTON AND C. W. TAYLOR 


Procedures and Results 





77 


The data were obtained on 170 high-school seniors in Washington, D.C. 
The score distributions on the eight factor scores were normalized. 
The matrix of correlation coefficients for the 13 variables was analyzed by 


Thurstone’s group centroid method. Six factors were extracted and an 


oblique rotational solution obtained. 


TABLE 1 


Intercorrelations (above diagonal) and Residuals (below diagonal) 











var. 1 2 3 4 oe -F -S 38 10 2 te 2 
1 oh 35 21 22 19 #+'14 #+%&17 -07 «+110 -10 -08 -08 
2 -05 34 3% 2h 23 26 27 -20 O07 O1 05 22 
3 01 06 33 51 36 40 33 -12 -12 -19 -10 14 
4 02 02 -03 18 30 21 22 -16 -10 -11 -07 22 
5 00 03 «405 «=-03 36 40 38 OF O1 -18 -18 -05 
6 -01 -04 -02 03 02 32 32 -07 «#05 -01 «#O7~= «(07 
7 00 -02 O01 03 -01 -02 52 -16 11 OF 16 27 
8 02 -01 -01 03 -04 -01 -03 -17 16 -05 -01 10 
9 -01 -03 -01 03 02 -01 -01 -01 06 «45 23 -58 
10 03 02 -01 -02 05 -01 -02 00 -01 53 48 -12 
11 -01 -01 09 01 -03 -02 02 O01 00 03 90 -08 
12 00 -02 -01 00 00 03 -02 O01 03 OF O12 22 
13. 00 -O1 -O01 02 02 -O# 02 -01 -08 -04% -02 -03 
- 8 &® 
¥ f ad TABLE 2 
The Unrotated Factor Matrix 
1 «mt w y VI ne 

1 38 -02 -10 03 38 -22 35 

2 46 O7 15 -05 -26 -15 34 

3 70 -14% -01 -20 20 -01 59 

4 48 -09 11 -18 -13 -33 42 

5 63 -12 -27 00 O01 22 53 

6 55 O7 -02 -0% -02 -01 31 

7 62 16 15 09 -02 33 56 

8 62 06 -01 28 -13 20 52 

-9 20 -29 66 34% 02 -16 70. 

10 06 63 -08 39 O# -10 57 

11 =#-13° «98 #-11 -21 -07 03 += 96. 

12 -048 92 27 -26 05 12 100- 

13. 21 «#202 «=76 «-12 «~-07) «205—t—s«655 











78 PSYCHOMETRIKA 


The intercorrelations are presented in Table 1 along with the sixth- 
factor residuals (below the principal diagonal). The correlations among the 
personality scores were highest, as might be expected. In general, those among 
the mental abilities were next in size and those between mental abilities and 
personality traits were the lowest. 

Table 2 presents the centroid matrix and Table 3 the factor matrix 


TABLE 3 
Pinal Rotated Matrix 





A B c D E P Variable 
-02 -~'7 00 03 -06 16 Memory 
40 -10 16 06 06 06 Perceptual Speed 
11 ~36 ~33 -06 O7 -10 Reasoning 
46 10 OO -03 -01 -03 Number 
08 57 -12 -14 -04% Verbal Comprehension 
18 11 31 OT 02 05 #=wWord Fluency 
-05 00 61 O09 +29 12 #£=Verbal Versatility 


on naw -§ WD 
g 


06 -10 57 -05 06 28 $=Ideational Fluency 
-08 -39 51 39 Social Extraversion 


tS 
i} 
8 


03 +50 -03 54 #£=‘Thinking Introversion 


vw 
°o 
' 

° 
Pr 
8 











11 09 -06 -02 9% O01 O00 Depression 
12 -05 O05 O1 93 42 +01 Cycloid Tendency 
13 o4 -03 O48 oo 671 01 Rha thymia 
TABLE 4 TABLE 5 
Final Transformation Matrix Reference Vector Cosines 
A B c D E FP A B c D E 
Zz 27 20 57 -01 8 13 B -35 
II -24 -18 14-29 «=-05 = «690 c -24 -27 
III -68 92 -17 10 23 11 D 09 17 +-10 
Iv -64 -28 78 -08 34 -27 E -40 12 12 08 
Vv O4 01 o2 95 | ae > | FP -08 oO4 -048 00 O48 





VI -07 -03 -15 -O# 90 13 





after rotation to simple structure. In these two tables and in all discussions 
hereafter, variable 9, Social Introversion, is treated as a reflected variable 
and is labeled — 9 and called Social Extraversion, for convenience. Table 4 
presents the final transformation matrix and Table 5 the intercorrelations 
between the reference vectors. 

The six factors in the rotated solution include three factors involving 
mental abilities and three mainly concerned with personality traits. None of 














J.C. DENTON AND C. W. TAYLOR 79 


the personality variables had loadings on the mental ability factors, but 
two fluency variables did have loadings of almost .30 on each of two personality 
factors. 

All variables with loadings greater than .25 are shown below for each 
factor. The interpretations are quite tentative, since the composite variables 
used are not like those customarily employed in factorial studies to define 
either primary or second-order factors. 


Factor A 


4. Number 46 
2. Perceptual Speed 40 


This is apparently a number, or perhaps a speed, factor. The appearance 
of Perceptual Speed (Identical Numbers test) is not surprising. In Taylor’s 
original fluency study the same variable had a loading of .24 on the number 
factor. Tests designed to measure such abilities as Perceptual Speed or 
Carefulness which involve the manipulation of numbers frequently have 
significant projections on a number factor. 


Factor B 
1. Memory A7 
3: Reasoning .36 


Factor B is tentatively designated as a memory factor. Some of the 
major studies in the field have shown that tests of reasoning ability are 
related to memory (2, p. 148). 


Factor C 
7. Verbal Versatility 61 
8. Ideational Fluency 57 
5. Verbal Comprehension 57 
3. Reasoning 33 
6. Word Fluency 31 


This factor approaches a general factor of mental ability best represented 
by verbal tests, particularly by measures of fluency which involve the meaning 
of words. 


Factor D 
11. Depression 94 
12. Cycloid Tendency .93 
10. Thinking Introversion .50 


—9. Social Hztraversion — 39 








80 PSYCHOMETRIKA 


This factor approaches a general (to this battery) measure of personality, 
each of the variables except Rhathymia having projections on it. It may be 
tentatively interpreted as Depression. Items dealing with ‘‘moodiness,”’ 
“feelings easily hurt,” “lost in thought,” and “‘self-conscious”’ are typical of 
the depressive-type item contained in common in the S, 7, C, and D scoring 
keys. Items such as these can account for much of the variance in this factor. 
The negative loading of variable 9, which was reflected prior to factoring, 
means that the unreflected variable, Social Introversion, is positively related 
to this factor. Factor D corresponds closely to Lovell’s (4) factor which was 
called ‘“Emotionality,” or the opposite pole of Thurstone’s (10) ‘‘Emotional 
Stability” factor. 


Factor E 
13. Rhathymia 71 
—9. Social Extraversion 51 
12. Cycloid Tendency 42 
7. Verbal Versatility .29 


Surgency is probably the best interpretation that can be given to this 
factor, in spite of the leading variable. Cattell has pointed out the similarity 
between Surgency and Rhathymia. This interpretation is supported by the 
positive loading of Social Extraversion and by the fact that it fits Studman’s 
definition of a fluent person. In many ways it corresponds to the “Drive” 
factor found by Lovell. The loading of Verbal Versatility lends limited support 
to the original hypothesis. The correlation of .27 between Rhathymia and 
Verbal Versatility was larger than any other correlation cutting across the 
cognitive and personality domains. The other two types of fluency, Ideational 
Fluency and Word Fluency, showed no relationship with this ‘Surgency” 
factor. 


Factor F 
10. Thinking Introversion 54 
—9. Social Extraversion 39 
8. Ideational Fluency .28 


This rather ambiguous factor is not strongly determined and is difficult 
to interpret. Abstracting from Guilford’s original definitions, this factor may 
represent “meditative thinking, philosophizing, and analyzing one’s self’’ 
in addition to ‘entering into social contact, not shy” plus “fluent expression 
of ideas.”’ Such a trait configuration seems somewhat unlikely. With present 
crude insights, it is difficult to sense what might be in common among these 
personality and mental ability scores. Again, it was a fluency score involving 
the meaning of words which showed a slight sign of bridging the gap into the 
personality domain. 

















J.C. DENTON AND C. W. TAYLOR 81 


Discussion 

Since the mental ability variables analyzed here are all composites except 
one, the factor analysis results are similar in one sense to the second-order 
analysis reported by Rimoldi (5). Relationships in such studies generally 
seem magnified. This provides another reason for considering the interpre- 
tations of the factors as tentative. 

The hypothesis that there is a relationship between fluency scores and 
certain personality characteristics is supported to a limited degree. The 
evidence relevant to this hypothesis is as follows: the fluency measure, 
Verbal Versatility, .had a projection of .29 on the Surgency factor (Z) and 
correlated .27 with Rhathymia. Ideational Fluency had a projection of .28 
on an ambiguous personality factor (F); Word Fluency had zero loadings 
on the three personality factors, and all personality scores had zero loadings 
on the three mental ability factors. The results for the remaining mental 
ability variables are consistent with those of other investigations, in which 
many zero and a few low relationships between the mental ability and 
personality areas are reported. Improvement in test construction in both 
domains and further analyses may lead to higher correlations and also to 
greater insight into the bases of any relationships that appear. 


REFERENCES 


1. Cattell, R. B. Personality: A systematic, theoretical and factual study. New York: 
McGraw-Hill, 1950. 

2. French, John W. The description of aptitude and achievement tests in terms of rotated 
factors. Psychometr. Monogr. No. 5, Chicago: The University of Chicago Press, 1951. 

3. Guilford, J. P. When not to factor analyze. Psychol. Bull., 1952, 49, 26-37. 

4. Lovell, C. A study of the factor structure of thirteen personality variables. Educ. 
psychol. Meas., 1945, 5, 335-350. 

5. Rimoldi, H. J. A. The central intellective factor. Psychometrika, 1951, 16, 75-101. 

6. Rogers, C. A. A factorial study of verbal fluency and related dimensions of personality. 
Amer. Psychologist, 1952, 7, 290. (Abstract). 

7. Studman, L. Grace. Studies in experimental psychiatry. V. ‘w’ and ‘f’ factors in relation 
to traits of personality. J. ment. Sci., 1935, 81, 107-137. 

8. Taylor, C. W. A factorial study of fluency in writing. Psychometrika, 1947, 12, 239-262. 

9. Thornton, G. R. A factor analysis of tests designed to measure persistence. Psychol. 
Monogr., 1939, 51, No. 229. 

10. Thurstone, L. L. The dimensions of temperament. Psychometrika, 1951, 16, 11-20. 


Manuscript received 1/18/54 
Revised manuscript received 4/12/54 











PSYCHOMETRIKA—VOL. 20, NO. 1 
MARCH, 1955 


A TABULAR METHOD OF OBTAINING TETRACHORIC r WITH 
MEDIAN-CUT VARIABLES. 


GEORGE SCHLAGER WELSH 


THE UNIVERSITY OF NORTH CAROLINA 


A method is presented that enables the immediate determination of 
tetrachoric r from a table if the proportion in the plus-plus cell for median-cut 
variables is known. 


There is an ever-increasing use of factor analysis, cluster analysis, and 
related techniques in psychological research. Since numerous coefficients 
of correlation are required for the matrices, many investigators have employed 
tetrachoric r’s and have utilized various short-cut methods for obtaining 
these coefficients. This seems to be especially prevalent in preliminary 
investigations where the greater exactitude of more time-consuming methods 
of determining correlation is not feasible. 

In many cases continuous distributions are dichotomized; it is often 
possible to make the cuts at the medians. The writer was able to divide at 
the median 24 of the 26 variables employed in a recent problem. To facilitate 
the determination of tetrachoric r a table was prepared so that the coefficient 
could be determined immediately if the proportion in the plus-plus cell were 
known (Table 1). 

The table was prepared by using the computing diagrams of Chesire, 
Saffir, and Thurstone (1). To use these diagrams data are arranged in a 
fourfold table as follows: 


+} on 
+ e b 
a 
From the diagram for a = .50 the value of c corresponding to a particular 


value of r,,, was determined by noting where the r,,, curves from .10 through 
.95 cut the vertical line for b = .50. The proportions for 1.00 and .00 are, of 
course, known. These twelve points then described a curve with r,,, from 
00 to 1.00 on the ordinate and proportions from .50 to .25 on the abscissa. 
These points were located on a large (26 by 30 inch) sheet of graph paper 


83 








84 PSYCHOMETRIKA 
TABLE 1 
Three-Place Tetrachoric r Corresponding to Proportion in Plus-Plus 


1 for Median-Cut Variables 








Proporvions 














| C00 001 002 003 ook 005 006 007 008 009 
490 992 993 99h 99h 995 996 997 998 999 9995 
480 98h ©5 986 987 988 989 990 990 991 992 
470 c72 «697 «©6975, '—ia976—i978 97998 s—si9HD (assi9HD_———s«O9'BB 
4,60 956 960 61 62 964 965 966 968 969 971 
450 eh3 «Oth 9K6 So 9K7—s8B—s950—s<9S—i9S 9S 9B 
ho ¢2) 926 928 930 932 933 935 937 939 g9h1 
1:30 oo) 90 908 910 912 91h #916 #918 ##+%920 922 
Li20 678 881 68h, 887 890 893 895 898 900 2 
Vel &i:7 850 85h 857 4860 863 866 869 872 875 
Loo 814 818 821 82), 827 831 83h 837 8L0 8hLh 
390 777 +781 785 789 +793 797 +\800 80h 807 811 
380 734 739 74h 748 3 37 Tf 7% 7% 73 
370 688 693 697 702 707 711 76 =721 #«91725 #730 

| 360 639 64h  6b9 65h 659 66h 669 67h 678 683 
350 539 59h 599 60h 609 614 619 62h 629 634 
340 536 0=— Su2—“(iéiéTC (si52CB 563 S568 S573 si 578 = 5B 
330 4820 ssan87)—ssaw93—si9BS CC SOiCiSOsCiS S520 i525séO3L 
320 he6— 3237) SOS OCidSC(sCTL—SséiWTT 
310 368 «86937, 380 385 391 397 =\4o3 oo kh ~ 20 
300 309 315 321 327 «4»+333 «64339 «€63h5 6351: «6356 ~~ 362 

| 290 cho 255 261 267 «©9273 «279 «285 291 297 303 

I nan 187 19; 200 206 212 218 22h 230 237 23 

j 270 126 #132 «613806 SC50sidS7~—s—«z266B'i—sid—Ci‘i‘zd1‘T#N'~S—s«id2BD. 
r¢ 063 069 075 082 #088 ok; 100 106 13 ~«O2299 

| 280 000 «= 006—“=é«éC2 019 40025 031 #038 #=%“Ohh 050 #057 











2cimeal points properly preceding each entry have been eliminated, 


and a smooth curve drawn. From the graph the 250 three-place r,,,’s for 
the corresponding proportions were determined. 

To insure accuracy a check was made by computing from formula the 
proportions for various r,,,’s lying between .10 and .80. When both variables 
are cut at the medians the usual formula for r,,, to five terms can be reduced 
and rearranged to solve for the proportion in the plus-plus cell as: 


3 5 
proportion = .15915504 (ra + - + ari) + .250. 
These proportions and those obtained graphically agreed to two places with 
only rounding errors in the third place. Values above r,,, = .80 could not 
be checked by means of the shortened formula but this section of the curve 
was redrawn on a larger scale and the values checked. 
Table 1 is used in the following way: 
(1) when both variables are cut at the medians and arranged in a 














GEORGE S. WELSH 85 


fourfold table, determine the proportion to three places falling in the plus- 
plus cell; 

(2) this proportion is found in the marginal entries of the table; 

(3) entering the table read off the r,,, to three places from the body of 
the table (see example A). 

(4) if the proportion in the plus-plus cell is less than .250, the r,,, will 
be negative. In this case use the proportion in the plus-minus cell (or .500 
minus the plus-plus proportion) and place a minus sign before the obtained 
Trer (see example B). 

Examples: 


A. + - 
+ 33 100 
- a 67 100 


67 a 
100 100 200 200 = oo 
At the intersection of row 330 and column 005 read off r;., = .509 
B. + a 
+ 40 110 150 
— 110 40 150 
i's Sa 40 _ 1a 110 _ 
150 150 300 300 = 133, 300 = 367 
At the intersection of row 360 and column 007 read off r,., = —.674 
REFERENCE 


1. Chesire, L., Saffir, M. and Thurstone, L. L. Computing diagrams for the tetrachoric 
correlation coefficient. Chicago: Univ. Chicago Bookstore, 1933. 


Manuscript received 2/4/54 


Revised manuscript received 3/27/54 











PSYCHOMETRIKA—VOL. 20, NO. 1 
MARCH, 1955 


AN IBM METHOD FOR COMPUTING INTRASERIAL 
CORRELATIONS* 


M. Carr Payne, JR. 
AND 


LEONARD STAUGAS 


UNIVERSITY OF ILLINOIS 


A method for computing intraserial correlations using a 602-A Calculating 
Punch, an 077 Collator, a 5138 Gang Punch, and a 403 Tabulator is described. 
An example of the use of the procedure and an estimate of the time needed with 
each machine are given. This procedure is compared with another method, 
which makes use of a more powerful IBM machine. 


Introduction 


In a recent article, Grant (3) described an experimental approach to 
behavior as a time series which new developments and adaptations of quanti- 
tative techniques have made possible. In the fields of psychophysics and motor 
performance, the new techniques (1, 2, 6) typically have dealt with binary 
data, e.g., did a subject see a light flash or not. A more generally useful tech- 
nique of time series analysis is one which uses continuous data. One such 
technique obtains correlations by calculating Pearson product-moment corre- 
lations within the series (intraserial correlations). An automatic method for 
computing these correlations with typical IBM equipment is described below. 

Computing intraserial correlations requires the pairing of measures which 
were separated by any stated number of measures in the original series. Lag x 
(or 7,) is the case in which an event is displaced x events from the one with 
which it is paired in obtaining a serial correlation (or autocorrelation). To find 
the correlation coefficient for lag 1 over a series of N measures, it is necessary 
to pair measure 1 with measure 2, 2 with 3, 3 with 4, --- , N — 1 with N. The 
n for computing the correlation is equal to N minus the lag number. If the 
correlations are to be obtained with IBM machines of the order of the 602-A 
Calculating Punch, the data must be so organized that each score appears on 
the same punched card as the score with which it is to be paired in the correla- 
tion computation. The following procedure was worked out using the 602-A 

*This research was supported in part by the United States Air Force under Contract 
No. AF 33(038)-25726, monitored by the Air Force Personnel and Training Research 


Center. Permission is granted for reproduction, translation, publication, use and disposal 
in whole and in part by or for the United States Government. 


87 








88 PSYCHOMETRIKA 


Calculating Punch, the 077 Collator, the 513 Gang Punch, and the 403 
Tabulator. [A very interesting approach to this same problem, using a more 
powerful IBM calculator is described in Schipper and Gruenberger (5).] 


Outline of the Procedure 


In general terms, the procedure is as follows: The data are punched in 
conventional form across the card, several cards usually being needed to con- 
tain the measures for one series. Punching begins in a column which depends 
on the number of lags to be computed and the number of digits in each meas- 
ure. (See the boxed-in entries on the original cards in Fig. 1). In addition to 
the measures themselves, each card contains appropriate identification of the 
data. A certain number of blank cards depending on the location of the first 
punched column, and the number of digits per measure are now inserted 
behind every origina] data card (see “inserted cards” in Fig. 1). Measures are 
punched from the original card into these blank cards using the interspersed 
master-card gang-punching principle. As this gang-punching is carried out, 
each measure is offset to the left by one measure on each succeeding card. 
Thus, if each score contains two digits, a score in columns 23 and 24 on the 
original card appears in columns 21 and 22 on the second card, in columns 19 
and 20 of the third card, etc. This principle, which has been used by various 
workers, was described several years ago by Hartley (4). When the original 

. cards and the inserted cards have been passed through the Gang Punch once, 
the first columns through the deck, depending on the number of digits per 
measure, contain all the measures in the series. In the two digit case these are 
contained in columns 1 and 2 through the deck. All cards which do not contain 
punches in these first columns are discarded. 

The next step is to construct a set of ‘answer’ cards, each of which is 
eventually to contain the computations relating to one correlation. An answer 
card is inserted at the end of each deck representing a series of measures. The 
whole deck is then passed through the 602-A Calculating Punch. The 602-A 
is wired to accumulate the VN, >> X, > Y, >> X’, > Y’, and >> XY for the 
lag 1 correlation and to punch them into the answer card. To obtain similar 
information for lag 2, three steps are followed: (a) the whole file is put through 
the Collator, which in one operation removes the old answer card, removes the 
terminal data card on which the needed Y measure is missing, and inserts a 
new answer card, (b) the 602-A is rewired to accumulate the needed sums for 
lag 2, and (c) the file is passed through the 602-A, and the new answer card is 
punched. This procedure is repeated for as many lags as are to be examined 
in the series of measures. 

When all the correlation answer cards have been assembled, they are put 
through the 602-A twice more, using two panels which compute the square of 
the Pearson product-moment coefficient, using the raw score formula. In the 
process, the squared correlations are punched into available unused columns 














M. C. PAYNE, JR. AND L. STAUGAS 89 


on the answer cards. The answer cards are then merged with a table of cards 
containing all possible values of r, and using the interspersed master-card 
gang-punching principle again, the proper r is found and punched directly 
on the answer cards. The answer cards are then ordered appropriately and a 
listing of the correlations is prepared on the Tabulator. 


An Example 


As an example of the use of this procedure we may cite the intraserial 
correlations computed for lags 1 through 10 for some brightness matching 
data. These data were available as two-digit scale readings. They were 
punched, 20 readings per card, beginning in column 23. Suitable identification 
was punched in columns 63-67. Different series of the data varied in the num- 
ber of readings, but for a series of 120 readings, for example, it was necessary 
to punch six cards. The first card of each series was identified with “X”’ 
punches for interspersed master-card gang-punching. The Collator was used 
to insert 19 blank cards behind each original card except the last one of the 
series which was followed by 29. 

Columns 3 through 62 of the punch brushes of the Gang Punch were 
wired, in order, into columns 1 through 60 of the punch magnets so as to 
accomplish off-set gang-punching from each card into the next. The columns 
containing identification were wired to gang-punch normally and the “PX” 
hubs wired for interspersed master-card gang-punching. The Gang Punch 
read the values on each card and punched them into positions two columns 
to the left on the succeeding card. By the time the first nineteen inserted cards 
had been punched, the data were displaced to the left far enough so that 
columns 1 through 22 on the next card, the second original card, were punched 
with values which serially preceded those in columns 23 through 62 on that 
card (as in Fig. 1). This off-set punching was continued until the last two 
responses of the data were punched in columns 1, 2, 3, and 4 of the last card 
in the deck. 

After all the cards had been passed through the Gang Punch, the first 11 
of each series (the original punched card and the first ten inserted cards) were 
removed and discarded as they did not contain punches in columns | and 2. 
At this point, each series deck was as follows: the first card (inserted card no. 
11 in Fig. 1) contained the first through the 20th reading in the first 40 col- 
umns plus appropriate identification. The second card contained readings 2 
through 20 in the first 38 columns. The last card contained the last two read- 
ings in the first four columns. It was now possible to obtain lag 1 correlation 
by working with columns 1 and 2 for the X value and columns 3 and 4 for the 
Y value, through the deck of cards. Measures 1 and 2 were in these columns 
on the first card, measures 2 and 3 on the second card, etc. 

The 602-A Calculating Punch calculated and punched the N, sums, 
sums of squares, and sums of cross products for each correlation into an 











*wesSeIp sty} WIOIJ pazIWiO are SatzjqUa UOTE ITZIQUapI pazerd ‘AztdtTduIIs 20q ‘uoTZe19do Butysund-3ues ayy Butanp 
aeadde spied [eurZ110 pue pazzasut uo salijqua 19430 [Ty ‘SaWesj Ul paso[2ua 28043 aie spied [eUIBI10 uO payound-Aay are yoy sainsean, 





























: OI SeI 1 
Ter ’ 1103 BIL 
rT Seq . 
“2120 
[°%wW  oSw ° : Ga OSna 68 : zh] (Ohm . . Zou (ih 2h ¢ “B20 
Oba 6fu . ° 12... Of, 62y, g¢ “sul 
OF 6fu Stu . . Ofw 62u 82w LE ‘sul 
Lol 
E OF, : : : : ; ; €2u 2a. 12. ; ‘ “. Pt... Llis or ‘eu 
a [Fp bt : : : : : : zy (tw) Ot . . 3 les Oley = S20 
SY 02a 6luy . : it... Olw bu 61 ‘sul 
oO . . . . . . . 
Au 02w ® lw @lw Tu s , fw 2w Tw Il ‘sur 
02 Glu . te Tea Olw . . Zu ti. ot om 
O2u: st. Gl... : ee Olu: bu S ; ae . ‘on 
O2w . . 21 3) Oe ot... . oe a i 1 ‘suzy 
[nw otw ; ° tie Ol én : a Iw] s a 
29°19 09-65 ’ - bP-fh «= ZP-1H OOF" OE ' 92-S2PZ-EZ_—s-AZ-1Z . . 9S bre 2-1 
zaquinn 
daqunn uwnyjod pred paed 





#821095 B13IG OMY YIM SBeT ua] 103 ainpar»01g jo atdwexg 


1 Funds 











M. C. PAYNE, JR. AND L. STAUGAS 91 


appropriately identified answer card that was inserted at the end of each 
series. After all the correlation components had been obtained for a particular 
lag, the file of cards was put through the Collator to remove the old answer 
cards, insert answer cards for the next lag, and remove those score cards which 
were no longer appropriate. These inappropriate score cards were those which 
did not contain punches in all the columns being considered for the new Y 
values. This calculating and collating process was repeated for each of ten 
lags. To check on the efficiency of the Collator, it was found to be worthwhile 
to examine visually and to count all rejected score cards. 

At this point, each answer card contained the components necessary for 
computing the correlation coefficient for a particular lag. These computations 
were carried out as described above, and the resulting r’s and components 
were tabulated. 

Calculation Times 

The time needed on each machine to calculate 280 correlations with N’s 
ranging from 119 to 110 (120 original responses correlated for 10 lags) in the 
study used in this example is summarized below. 





Key Punch .5 hour 
Calculator 26.5 hours 
Collator 3.0 hours 
Sorter 1.0 hour 
Tabulator .5 hour 
Gang Punch 3.0 hours 
Total time 34.5 hours 
Discussion 


This computing procedure provides no machine check on results obtained 
at any stage. In the work cited above, suspicious coefficients were recalculated 
through the entire procedure and also as a check, a few non-suspicious coeffi- 
cients were randomly selected and calculated on a desk calculator. None of 
these checked coefficients or components was found to be in error. The pro- 
cedure has the advantage that all of the components of each correlation 
coefficient are punched into one “answer card,” which makes it easy to use 
these values in other calculations where they may be needed. 

For data where each measure is known very precisely and contains a 
large number of significant digits, the procedure outlined by Schipper and 
Gruenberger (5) using a more powerful calculator is probably more desirable 
than the present one. However, in most psychological research the number of 
significant digits in each measure is small. In this case the difference in time 
per correlation between the two procedures does not warrant the use of the 
more high-powered calculator. The present procedure, for reasons of ready 
availability of the equipment needed, simplicity, and ease of understanding, 
is probably the more satisfactory one for most psychological research. 





PSYCHOMETRIKA 


REFERENCES 


. Abelson, R. P. Spectral analysis and the study of individual differences in the per- 
formance of routine, repetitive tasks. Princeton: Educ. Test. Serv., 1953. 

. Flynn, J. P. Lack of randomness in sequences of auditory differential threshold data. 
Amer. Psychologist, 1948, 3, 254. Abstract. 

. Grant, D. A. The discrimination of sequences in stimulus events and the transmission of 
information. Amer. Psychologist, 1954, 9, 62-68. 

. Hartley, H. O. The application of some commercial calculating machines to certain 
statistical calculations. Supp. J. roy. statist. Soc., 1946, 8, 154-183. 

. Schipper, L. M. and Gruenberger, F. A method of calculation of serial correlation coeffi- 
cients utilizing the IBM Card-Programmed Electronic Calculator. Res. Bull. 53-10, 
6564th Research and Development Group, HRRC, Air Research and Development 
Command, Lackland Air Force Base, Texas, May, 1953. 

. Verplanck, W. S., Collier, G. H., and Cotton, J. W. Nonindependence of successive 
responses in measurements of the visual threshold. J. exp. Psychol., 1952, 42, 273-282. 


Manuscript received 12/31/53 


Revised manuscript received 4/12/54 





soon 


oe 





